Pathology Informatics Questions and Answers From the University of Pittsburgh Pathology Residency Informatics Rotation

Pathology Informatics Questions and Answers From the University of Pittsburgh Pathology Residency Informatics Rotation

Harrison, James H Jr

Context.-Effective pathology practice increasingly requires familiarity with concepts in medical informatics that may cover a broad range of topics, for example, traditional clinical information systems, desktop and Internet computer applications, and effective protocols for computer security. To address this need, the University of Pittsburgh (Pittsburgh, Pa) includes a full-time, 3-week rotation in pathology informatics as a required component of pathology residency training.

Objective.-To teach pathology residents general informatics concepts important in pathology practice.

Design.-We assess the efficacy of the rotation in communicating these concepts using a short-answer examination administered at the end of the rotation. Because the increasing use of computers and the Internet in education and general communications prior to residency training has the potential to communicate key concepts that might not need additional coverage in the rotation, we have also evaluated incoming residents’ informatics knowledge using a similar pretest.

Data Sources.-This article lists 128 questions that cover a range of topics in pathology informatics at a level appropriate for residency training. These questions were used for pretests and posttests in the pathology informatics rotation in the Pathology Residency Program at the University of Pittsburgh for the years 2000 through 2002. With slight modification, the questions are organized here into 15 topic categories within pathology informatics. The answers provided are brief and are meant to orient the reader to the question and suggest the level of detail appropriate in an answer from a pathology resident.

Results.-A previously published evaluation of the test results revealed that pretest scores did not increase during the 3-year evaluation period, and self-assessed computer skill level correlated with pretest scores, but all pretest scores were low. Posttest scores increased substantially, and posttest scores did not correlate with the self-assessed computer skill level recorded at pretest time.

Conclusions.-Even residents who rated themselves high in computer skills lacked many concepts important in pathology informatics, and posttest scores showed that residents with both high and low self-assessed skill levels learned pathology informatics concepts effectively.

This article contains 128 questions that were used in pretests and posttests for a mandatory 3-week, full-time rotation in pathology informatics within the pathology residency program at the University of Pittsburgh (Pittsburgh, Pa). An overview of the rotation and an evaluation of the results of the pretests and posttests have been published previously.1 Most of the questions have short answers and are meant to elicit responses consisting of no more than a few sentences, a list, or a labeled diagram. For convenience in lookup and comparison, the questions have been organized into categories corresponding to major informatics topics (see Table).

Each test consisted of about 30 questions to be completed within 2 hours. Grading allowed half credit for questions that were partially correct. Scores averaged 15% to 20% for the pretest and about 70% for the posttest during the 3 years (2000-2002). The minimum passing score was 50%. Pretests were given on the first day of the pathology informatics rotation and posttests were given on the last day (about 3 weeks later). The rotation was scheduled in July at the beginning of the second year for most residents.


This list of questions is meant to be a guide to the breadth and depth of topics in a successful pathology informatics rotation, and the questions may also be useful as a guide to testing in informatics rotations in other training programs. The answers are supplied here to orient the reader to the intent of the question and the desirable detail level of a representative response in our program. They are not meant to be complete or to be used as a comprehensive study guide in pathology informatics.

Although our rotation broadly covered pathology informatics, we emphasized topics that were not encountered as part of routine clinical rotation information system experience in pathology training, such as data formats, interfaces, enterprise system architecture and evaluation, various coding schemes, networking, markup, etc. Exposure to these topics in our program occurred primarily during the didactic portion of this rotation and later during elective projects. In contrast, basic information about anatomic pathology (AP) and clinical pathology laboratory information system (LIS) operation was taught as part of other pathology rotations during the first and second years of training. For that reason, this list of questions is weighted toward general enterprise computing and general data management topics at the expense of pathology-specific information systems. A detailed overview of the University of Pittsburgh pathology informatics rotation is provided elsewhere.1 Other programs should take their particular training environment into account in planning their informatics curricula.

Questions on topics that are covered in the rotation but are distinct from pathology informatics, such as statistical process control (eg, Westgard QC), are omitted from this list.


General Personal Computer Hardware and Troubleshooting

1. You boot your personal computer (PC), and when the monitor comes up you see rapidly scrolling horizontal and vertical lines that occupy approximately 20% of the central portion of the screen. What is the most probable cause of the problem?

Answer: The display control panel has probably been changed to a resolution that is incompatible with this monitor. Reboot and choose the second booting option, which will generally start up with a default 640 × 480 resolution compatible with most monitors. Then go to the display control panel, click on the settings tab, and choose the appropriate resolution.

2. You have been away on vacation and return home, reboot your PC, and find that on boot the following error message is displayed: “Hard drive 0 not found.” What is the most probable cause of this problem?

Answer: The computer’s battery (or CMOS battery, BIOS battery, or motherboard battery) has discharged and should be replaced. This is more likely in a computer that is several years old.

3. What part of a computer is the primary storage area for data in use and loses its contents when the computer is turned off?

Answer: Random access memory (RAM).

4. You have 300 MB of data (images, text, etc) that you are using for a paper you are writing. To work on the paper at home, you decide to burn all of the data to a CD. You select the files and write the CD using a CD rewritable drive. You finalize the session when completed. When you arrive at home you find that you cannot read or open the CD on your CD/digital video disk (DVD) drive. What went wrong?

Answer: The CD was probably removed from the original drive without closing the session. If the session is not closed, the CD will not be readable on a read-only CD drive.

5. You are comparing 2 computers for purchase, one with a 900-MHz processor and one with a 700-MHz processor. In trying them out with several programs, including word processing and a database program, you notice that there does not seem to be much difference in speed between the two. How might you explain this?

Answer: The difference between 700- and 900-MHz processors for standard business applications (not games) is relatively small and may not be noticeable in routine use. Also, if the processors are of different design, direct comparison of clock rates is not appropriate. In cases in which the processors are of similar design, other components also contribute to the overall speed of the computer, such as the disk drive speed, bus speed, and amount of RAM.

6. What component of a computer connects the memory, the processor, ancillary chip sets, and interface cards?

Answer: The system bus (motherboard is also acceptable, although not quite as good an answer).

7. Under what circumstances is the central processing unit clock speed (eg, 800 MHz) not a good measure for comparing the relative capabilities of computer processors?

Answer: When the processors being compared are of different internal design.

General Software, Operating Systems, and Applications

8. What is the recommended minimum font size for a PowerPoint presentation?

Answer: Eighteen points.

9. What class of fonts is most often used for PowerPoint presentations and headings in printed text? What class of fonts is most often used for body text in printed material?

Answer: Headings (including short PowerPoint slide bullets) often use sans serif fonts. Large blocks of small text, such as document body text in printed documents, are easier to read if serif fonts are used.

10. What is the difference between serif and sans serif fonts? How are each typically used?

Answer: Serif fonts have horizontal and/or vertical lines/ornamentations on their ascending and descending parts and usually have variable line weight. These features help mark the visual boundaries of a line of text and make long lines and small font sizes easier to read. Sans serif fonts are typically plain with constant line weight. This makes them stand out visually when they are used in headings, titles, and other short text sequences.

11. What is a PowerPoint template?

Answer: PowerPoint templates are predesigned slides containing graphics and color schemes to which text can be added to complete a presentation. A number of these templates are supplied with PowerPoint. PowerPoint also comes with several templates that include text suitable for standard business presentations, which can be modified to suit a particular presentation of that type.

12. What is the relationship between line length and line spacing (leading)?

Answer: Long text lines (ie, lines extending the full width of a page or slide) need wider spacing (leading) to maintain readability. Short lines (eg, narrow columns such as newspaper columns) can be spaced more tightly and still maintain acceptable readability.

13. What component of a PowerPoint presentation can be used to hold logos or text that appears on every slide?

Answer: The slide master.

14. What is the difference between an operating system and an application program?

Answer: An operating system manages the computer, provides an environment for running application programs, and usually gives the computer its “personality” through management of the general user interface. It is always running when the computer is operating. Application programs are usually run by the user to carry out specific tasks of interest to the user (eg, word processing, spreadsheets, or graphics programs; continuously running server programs also fall into this category) and can be started and stopped by the user or server manager.

15. You have downloaded from the Internet a specification document for a new coagulation analyzer. The document is in American Standard Code for Information Interchange (ASCII) text and its icon is a generic blank page. How do you view this document in Microsoft Word so that you can print it for your hematology laboratory supervisor?

Answer: Open Word, choose “Open” from Word’s file menu, move to the directory holding the file, choose “All readable files” from the file types menu, and choose the file from the displayed list.

16. Define “multitasking.”

Answer: Running multiple applications simultaneously under one operating system.

17. List 3 tasks usually carried out by the operating system.

Answer: Memory management, user interface management, and file system management; more specific alternatives, such as network communications, communication with device drivers (and thus peripheral equipment), multitasking, and virtual memory, are also acceptable.

18. You are off to your first big international meeting to present your research. You have slaved for weeks preparing a PowerPoint presentation. You used lots of the new features in PowerPoint XP (2001) and want to ensure that thev will all work on the PC that you will be presenting on. What do you do to ensure that all goes well?

Answer: Save the presentation in “Pack-and-Go” format, which includes all fonts and image files used in the presentation and is also self-displaying (PowerPoint does not need to be installed on the presentation machine).

19. Give an example of an operating system.

Answer: Choose from Windows, Mac OS, Linux, or Unix; others, such as VMS and more specific versions of Unix (AIX, BSD, etc), are also acceptable.

20. A university Web site offers several PowerPoint presentations for downloading that may be used for laboratory technician inservice training. You decide to download one for evaluation using your Web browser. After the download, you have a file on your desktop named “Lecturel” with a blank page icon. What will happen if you double-click the file? What should you do next to determine whether you can open this file?

Answer: As first steps, make sure that the site the file came from is a trusted site and you know what you downloaded. Then make sure the file is not compressed or does not need to be otherwise decoded prior to use (you may need to obtain a special program for this). If you doubleclick on the file, an “Unknown file type” box will appear. From this box you can choose PowerPoint to open the file. You can also open PowerPoint first and use PowerPoint’s open command to go to the directory containing the file, choose “All types” from the file type pop-up menu, and pick the file’s name from the list. Alternatively, you could add “.PPT” to the file’s name and try double-clicking it directly.

Network and System Architecture

21. Name 3 different network topologies and draw a diagram for each one. (Optional: Draw and label a representative client, server, and network hub in appropriate places in your diagram.)

Answer: Ring, star, bus (with simple diagrams).

22. Sketch a brief diagram of a 3-tier enterprise information system architecture and label the major components.

Answer: See Figure 1.

23. Draw and label a diagram of a client-server system. What is the difference between a “thin client” and a “thick client”?

Answer: See Figure 2. Thick clients are typically large programs that run on peripheral machines and do most of the data processing within themselves. The server mainly supplies data to the thick client. Thin clients are smaller programs, often but not necessarily distributed via the network and written in Java, which run on the client’s machine and share the processing load with the server (some of the processing that a thick client would do is carried out on the server).

24. What is a “legacy” system?

Answer: A legacy system is usually an information sys- tem that has been in use for some time and contains valu- able data, but is out-of-date or otherwise does not directly fit the enterprise information system architecture. Legacy systems are often departmental systems that were originally stand-alone and are based on older technology and proprietary design. The value of the data in these systems and the expense of their replacement and data conversion may require that they be maintained, Because of their older technology and original stand-alone design, it is usually difficult, but important, to integrate legacy systems into an enterprise architecture. Integration strategies often treat legacy systems as data sources (ie, back-end systems), Note that systems like this are still available for purchase and installation. Buying a new system with an inflexible or out-of-date design creates an instant legacy system.

25. What is an “interface engine” and what advantages do they provide in a multisystem environment?

Answer: An interface engine is a middleware system that is designed to connect (interface) to multiple enterprise systems, often legacy systems, and allow them to share data. Each system connects to the interface engine and shares data through it, rather than connecting directly to other systems. This approach reduces the number of connections required between a group of systems and standardizes interface design.

26. What is the main difference between a data repository and a data warehouse?

Answer: Data repositories are large-scale databases that integrate a variety of clinical data from multiple systems. They are typically optimized for large numbers of small transactions, are directly connected to clinical systems for continuous data acquisition, and support standard reports and individual patient lookup. They are typically inefficient (ie, slow) for queries returning, for example, populations or aggregations of related patients. Data warehouses are also large integrated databases, but they are internally structured to support efficient query for populations of patients with similar or related characteristics, and they may allow “drill down” from aggregate to individual patient results. They typically do not support large numbers of small transactions well, and they are not directly connected to clinical systems.

27. Networks that have a common bus topology (eg, networks using Ethernet) have complex methods for dealing with contention and collision on the medium. How do networks that use a ring topology avoid this problem?

Answer: They usually pass a data “token” from computer to computer. Only the computer that has the token controls network communications and thus collisions do not occur.

28. What is a “packet-switching” network?

Answer: A network in which data are broken down into fragments (packets) for transmission. The packets are independently communicated through the network as they move to the receiving device. As they pass through the network, they may be stored on intermediate devices, which negotiate the best communications path for the packets. Depending on conditions, packets may take different routes to the receiving device. At the receiving device, packets art reassembled to recreate the original data.

29. List 3 major differences (other than size!) between mainframe computers and desktop computers.

Answer: Choose from cost (mainframe much higher), intrinsic multiuser operating system with remote terminals in mainframe, much better security in mainframe, high availability design with essentially all hot swappable components in mainframe, and generally much higher storage and processing capacity in mainframes (data throughput, large memory capacity, multiple high-speed processors, etc).

30. Why does an Ethernet network slow down when many computers on the network are trying to communicate (ie, there is high network traffic)?

Answer: Computers on a busy network occasionally try to send data at the same time, resulting in packet collisions, When that occurs, the computers wait for a brief interval and then try to communicate again. As the frequency of these collisions and the need to wait increases, the apparent speed of data transmission for each individual computer decreases.

31. List 2 advantages and 2 disadvantages of client-server systems Versus mainframe systems.

Answer: Advantages include lower central hardware cost; built-in scalability (within limits); terminals (PCs) can be used for multiple purposes, including office automation software and general purpose communications; and client software has familiar graphical user interface. Disadvantages include limited scalability for large systems, requires more peripheral support (PCs vs terminals), more susceptibility to user error and malicious software (viruses/Trojan horses), lower reliability, and potential for conflicts with other software running on the PCs.

32. What is the main benefit of “layered” protocols, such as those used in network communications like the open systems interconnect (OSI) networking model?

Answer: Data are passed in standard ways between the layers, thus the system is modular; developers creating software to run at any particular level of the process need not understand the details of what occurs at the other levels. This kind of model allows a similar software strategy to be applied across multiple operating systems, hardware platforms, and network environments merely by using the appropriate software modules for a given setting (rather than rewriting all the software layers for every minor change in the environment).

33. List 2 kinds of data that are often externalized from legacy systems to create shared services in enterprise systems.

Answer: Choose from patient or person identifiers and basic demographics, scheduling, standard vocabularies and coding schemes, notification/alerting, security (eg, single logon), and expert and decision support systems.

34. The telephone system and the Internet represent 2 different types of networks with respect to how data and communications pathways are managed. Briefly describe the major differences in these systems.

Answer: The telephone system is a “circuit-switched” system. When a call is placed, a circuit is dedicated to the participants and its full capacity is available for that communication. All information from the call passes over the same circuit pathway. When the system is overloaded, many calls cannot be placed, but calls that do go through function normally. The Internet is a “packet-switched,” store-and-forward network. Information is broken down into packets, which are sent across a communal network and reassembled at the destination. Packets may traverse different pathways, depending on moment-to-moment network performance. When the network is overloaded, the packet transfer rate slows down, but communications are not generally refused.

35. Define the term “intranet.”

Answer: An intranet is a (usually) local network of computers that uses Internet protocols (Transmission Control Protocol/Internet Protocol [TCP/IP]) for communications, but is separated from the Internet, either completely or via a firewall.

LIS Systems

36. What is a result at order entry?

Answer: A piece of information about a laboratory test that is acquired from the user by the system at test-ordering time. An example might be the time period of urine collection in a creatinine clearance test.

37. What are the differences between a bidirectional and unidirectional instrument interface?

Answer: A bidirectional interface downloads laboratory test data from the system to the analyzer. This allows, for example, a multitest chemistry analyzer to run a requested set of tests from a blood sample tube based on its barcode, without the necessity for a technician to manually enter the requested tests. A unidirectional interface supports only the upload of results data from the analyzer to the system, based on the sample’s identification number. A technician must manually enter each requested test into the analyzer based on a log obtained from the LIS.

38. List the main functions of an AP LIS.

Answer: Receive and store patient and specimen information from the hospital information system or direct entry specimen accessioning; support pathology workflow, including supplying information to and storing information from pathologists and histotechnologists as needed; store information on specimen results and allow coding in standardized terminology; report specimen results; and allow searching and retrieval for lookup of previous diagnoses and retrospective studies.

39. What is MUMPS (also known as “M”) and how is it used in pathology information systems?

Answer: A very efficient scripting language used widely in the past for programming medical information systems (and financial systems). Many clinical and laboratory systems have been and still are written in MUMPS, although its use has been declining in recent years.

40. Describe the manner in which results from our AP LIS system are posted to the clinical data management system. For example, what sort of interface is used and how often are results posted? What is the general pathway from the AP LIS to the clinical data repository? How quickly would a physician have access to the results?

Answer: The AP system sends results in Health Level 7 (HL7) format through an interface to the message router (our interface engine), and from there another HL7 interface posts the final pathology results to the clinical data management system. This is done in real time and results are available almost immediately. The pathway to the MARS data repository is similar, that is, HL7 interface through the message router and from there through another HL7 interface to the repository.

41. Our clinical laboratory system communicates with a variety of devices located outside the laboratories using a variety of mechanisms. For example, some of these devices support results reporting in hospital locations. Name and describe the purpose of 3 different kinds of devices outside the laboratory with which the laboratory system communicates.

Answer: Choose from result printers (located in patient care areas to quickly communicate results to the printed chart and care teams), barcode printers (located where blood samples are obtained to print labels for specimen tubes), fax machines (autofaxing to fax machines outside the hospital itself allows immediate reporting of results to locations without result printers, or to outlying clinical and physician offices), beepers (autopaging allows communication of alerts directly to care personnel), and point-of-care testing base stations (allows acquisition of patient and quality control data from handheld point-of-care testing devices).


42. The Internet offers several other services in addition to the World Wide Web. Name 1 of these services and describe what it is used for.

Answer: Choose from telnet (remote control of computers via a command-line interface), file transfer protocol (FTP; file transfer between specific machines), gopher (a precursor to the Web, remote access to file collections), e-mail (or simple mail transfer protocol [SMTP], transport of e-mail messages between mail servers), and secure shell (or SSH, remote access to machines and file transfer with encryption).

43. What is the relationship between an Internet protocol (IP) address and a domain name?

Answer: An IP address is the numerical address used for identifying computers in TCP/IP communications. A domain name is an alphanumeric name for a computer that typically has meaning to users and is converted automatically to an IP address by the domain name server (DNS) system. If DNS is available, the easier-to-remember domain names can be used transparently instead of IP addresses in Internet communications.

44. Match the following terms with their appropriate forms:

a. IP address

b. Ethernet (MAC) address

c. Domain name

d. URL (uniform resource locator)


f. http://jhh.cbmi.upmc. edu/pir/


h. 00:03:93:46:97:A6

Answer: a[arrow right]g, b[arrow right]h, c[arrow right]e, and d[arrow right]f.

45. What is a “cookie” in a Web browser?

Answer: A cookie is a term for a piece of data that a Web server can store in a Web browser for future use. Cookies may indicate previous visits to or membership in a Web site, or may be used to keep track of the progress of a user as they interact with a Web site.

Data Representation (Formats, Markup, and Coding)

46. What is the primary difference between hypertext markup language (HTML) and extensible markup language (XML)?

Answer: HTML is a markup language that uses a fixed set of markup tags defined by a document type description (DTD) that is maintained by the World Wide Web Consortium (W3C). XML is a framework for creating markup languages like HTML and thus can be used to create DTDs defining arbitrary sets of markup tags.

47. What are the advantages of rich text format (RTF) over plain ASCII text?

Answer: RTF is a standard way of defining document format (font, size, style, margins, line spacing, etc) using ASCII sequences embedded in the document. RTF files are read by many different word processors. Plain ASCII text files contain only the text of the document without formatting information.

48. What is the primary purpose of the Unified Medical Language System (UMLS)?

Answer: To allow mapping between the major standard terminology (coding) systems used in health care and to provide a “laboratory” for the study of standard terminology.

49. Write an example of an ordered (numbered) list in HTML.


50. Programs like Microsoft Word and PowerPoint can use text, images, audio, and other forms of data (ie, multimedia) within a single document. Typically these data elements are stored together as a single file on a disk. Web pages may also contain multimedia, such as images, video, and sound. How does the structure of a multimedia Web page differ from a multimedia word-processing or presentation document?

Answer: The data that make up a multimedia Web page (text, images, video, and sound) exist as separate files on one or more servers. These data are assembled into one display by the Web browser using links to the various files from the primary HTML page.

51. Given the following URL: “http://jhh.cbmi.upmc. edu/test/bb.html,” what kind of object on the server does “/test/” usually refer to?

Answer: A directory (“folder” is also acceptable).

52. What is the difference between ASCII and Unicode and what is the advantage of Unicode?

Answer: ASCII is a standard numerical code for text characters in which the value of each character is encoded in a 1-byte number (actually, standard ASCII is 128 characters encoded in 7 bits; computer/operating system designers have used the eighth bit to add another 128 characters in a semistandard fashion). Unicode uses 2-byte numbers to encode characters and therefore can encode more than 64000 separate characters. This difference allows encoding of a variety of language-specific character variations, nonroman alphabets, and Chinese/Japanese characters.

53. What is the purpose of the HL7 standard and how is it used in health care settings?

Answer: HL7 defines standard formats and message structures for transferring data between health care information systems. HL7 messages are text; the general structure of the messages is defined in the standard, but the actual text content and coding schemes are set by users and can be different for each implementation.

54. What do CPT codes describe and what is their primary use?

Answer: CPT stands for Current Procedural Terminology. These codes represent medical procedures (eg, a laboratory test, special stain, or cardiac catheterization) and are used to represent those procedures in professional billing systems.

55. What advantage(s) does the Systematized Nomenclature of Medicine (SNOMED) CT have as compared with previous versions of SNOMED?

Answer: The SNOMED CT contains the clinical terms from the British Read Codes (Clinical Terms), which gives SNOMED broad coverage of clinical practice; SNOMED CT also formally defines relationships between the terms it contains.

56. Write an example of HTML code that would link a section of text in one document to another document, and describe the difference between relative and absolute addresses.

Answer: The link.

Relative addresses locate a document with relation to the document containing the address (the link above contains a relative address). Absolute addresses locate a document to a specific server and directory tree (they point to only 1 particular place on the Internet and generally start with a protocol designation, eg, http://. . . ).

57. Name 3 advantages of SNOMED CT over the International Classification of Diseases 9 (ICD-9).

Answer: Examples include many more terms (more comprehensive), defined relationships between terms, system allows for expression of severity of illness (and other qualifications of diagnoses), it can combine terms for more accurate and precise description (compositional), and it expresses symptoms and findings more completely.

58. List 4 characteristics of an ideal standard medical nomenclature.

Answer: Examples include completeness, clarity (non-redundancy), compositional (specified syntax and grammar for combining terms), has methods to represent uncertainty and time, hierarchical, language independent, has unique context-free identifiers (codes), includes mappings to other nomenclatures, and is nonproprietary.

59. Your laboratory is developing a test reference guide for physicians to use when placing orders. You would like to put this guide online and also print it for distribution. Implementation possibilities include downloadable Word documents, a Web server with HTML documents, downloadable page description format (PDF) documents, or an XML document server with style sheets. Describe 2 advantages that the XML approach might have over the other alternatives.

Answer: (1) XML documents can be formatted for different purposes using several different style sheets. Thus, the same document could be formatted differently for desktop computers, personal digital assistants (PDAs), or different classes of users without needing multiple versions of the document. (2) Unlike a collection of Word, HTML, or PDF files, appropriate software can allow a collection of XML documents to be indexed and searched by subsection.

60. The next version of HL7 (version 3) will use XML as the basis for structuring its text messages. What advantages does XML offer for this purpose?

Answer: XML can represent a variety of data models, including relational tables and object-oriented data; XML provides a hierarchical, arbitrarily nestable structure; XML can be both human-readable and machine processable; and XML-processing software and code libraries are available at low or no cost.

61. Digital Imaging and Communications in Medicine (DICOM) microglossaries are standard terminologies that are intended for use in the textual descriptions of particular types of images. What large standard terminology are the microglossaries derived from?

Answer: SNOMED.

62. Define the following terms with respect to clinical coding schemes: multiaxial and compositional.

Answer: Multiaxial-a coding system consisting of several independent nomenclatures that address different topics, such as body site and etiologic agent.

Compositional-codes can be combined to specify concepts, such as combining codes for lung and inflammation and bacteria to represent bacterial pneumonia.

63. What is the purpose of the Logical Observation Identifier Names and Codes (LOINC) coding system?

Answer: The LOINC system is a coding scheme for specifying clinical observations. It was started as a nomenclature for clinical laboratory tests.

Databases, Data Mining, and Outcomes Research

64. Name 2 ways in which a relational database differs from a flat-file database.

Answer: Relational databases may have less data redundancy, be easier and less error-prone to update (especially when data are shared between records), provide more rapid response to certain kinds of queries, and require less space on disk and in memory (depending on the particular database design).

65. In a relational database, what is a primary key?

Answer: The primary key is the unique identifier of a row (record) in a database table.

66. You plan to develop a simple relational database for storage and retrieval of information for your research project on p53 levels in various tumor specimens. You want to keep track of the following information: patient name, patient date of birth, diagnosis, pathologist’s name, date of accession, surgical pathology accession number, p53 level, and location of frozen tissue. Which of these items should be the primary key?

Answer: Surgical pathology accession number.

67. You are developing a simple relational database to keep track of your recut slides and the articles you have read to go along with them. The information you want to store and retrieve is as follows: accession number (SP#); date of accession (DA); organ (ORGAN); procedure type (PT); diagnosis (DIAG); whether gross images are available in the AP information system (GROSS); whether special stains are available in the AP information (SS); reference number (assigned by you) to an article you have on that diagnosis (REF#); name of first author of article (AUTH); title of article (TITLE); and journal, volume, and page numbers (JL). Draw a diagram of the table or tables you would build to handle these data. Label the fields using the abbreviations provided.

Answer: See Figure 3. A simple design might include 2 tables with the first containing the first 8 items above (ending with reference number) and the second containing the last 4 (beginning with reference number). The tables would be related through the reference number.

68. List 3 differences between outcomes research and standard clinical trials.

Answer: Outcomes research is carried out with patients who have received standard and usually somewhat variable clinical care in nonresearch settings; patients in clinical trials receive uniform care specified by planned protocols. Outcomes research studies do not have formal control groups that are known to be comparable to the experimental groups. Outcomes studies are generally retrospective and are carried out against large existing data sets; clinical trials are generally prospective.

69. Outcomes research is often limited in the conclusions that can be drawn because of limitations in the data sources used for the studies. What are the most common data sources and what are their main limitations?

Answer: The most common data sources include large local or regional administrative databases from hospitals, insurers, or government agencies. These databases contain very limited clinical information (usually ICD-9 codes), and thus it is difficult to meaningfully stratify patients by the severity of their illness, particular symptoms or test result characteristics, or the details of their therapy.

70. Define “association rules” and describe their use in exploratory data mining.

Answer: Association rules express the likelihood of cooccurrence of features or events in records in a database (eg, if a patient has characteristics A and B, he or she has an 80% chance of having characteristic C). Data-mining software can automatically identify associations in large data sets. Although many associations are trivial, some indicate causative or “common cause” relationships. Changing associations over time may also provide useful information.

71. Why is a data warehouse useful in data mining? Is a data warehouse required for data mining? Why or why not?

Answer: A data warehouse allows efficient (rapid) extraction of aggregates of records with the proper form for mining. Other types of databases or even flat files may be used for data mining, but are more cumbersome and time-consuming to work with.

72. What advantage does a pathologist have over investigators in most other fields in carrying out outcomes or data-mining studies?

Answer: Some of the most important and useful data in clinical data mining are derived from pathology services (anatomic pathology diagnoses and laboratory test results). In most places, pathologists manage the systems that contain these key data.

73. What are process measures in outcomes research and why are they sometimes used in place of actual outcomes data?

Answer: A process measure is a piece of data that is closely related to an outcome, but is easier to measure or more available than the actual outcome data. Thus, it is convenient to use as a surrogate measure for the outcome. For example, the effect of a diabetes health education program in a population of physicians and patients might be measured by evaluating the number of eye examinations and regular evaluation of glycosylated hemoglobin (ie, good practices) rather than assessing the actual long-term health of the diabetics.

74. Imagine that you are a residency director building a database to keep track of residents and their rotation evaluations. For each resident, you want to keep track of his or her social security number (SSN), first name, last name, and area of special interest. Each resident can have multiple evaluations (because they have many rotations). For each evaluation, you want to keep track of the resident it belongs to, the rotation, the last name of the attending physician who completed the evaluation, and the text of the evaluation. Assume a relational design. Create a conceptual schema for this database using Entity-Relationship Diagramming techniques, including entities, attributes, primary keys, relationships, cardinality ratios, and participation constraints.

Answer: See Figure 4. Essentially, there are 2 entities, Resident and Evaluation. Resident has 4 properties: SSN (the primary key), last_name, first_name, and interest_area. Evaluation has 5 properties: evaluation_ID (the primary key), resident_ID, rotation_name, attending_name, and text. Resident and Evaluation have a “has_a” relationship that is 1 . . n (1 resident, many evaluations).

75. Describe the main difference between the hypothesis-testing and hypothesis-generating approaches to data mining.

Answer: In hypothesis testing, data mining is used to determine whether and under what conditions a proposed pattern exists in a large data set. In hypothesis generation, data mining is used to discover patterns in the data without prior knowledge of what kinds of patterns might exist.

76. What is an “entity-relationship” diagram useful for?

Answer: An entity-relationship diagram is a way of illustrating the structure of a relational database in a simple format. It displays the primary “entities” (tables) in the database and the relationships that exist between the data elements in the tables. It is useful as a basis for discussion during database design and in describing existing databases.

77. Define structured query language (SQL).

Answer: SQL is a standard programming language that is used to create and alter the structure of relational databases, and to store and retrieve data from them.

Expert Systems and Artificial Intelligence

78. Neural networks have been used to build systems that can perform pattern recognition tasks, such as identification of abnormal Papanicolaou tests. One concern about the use of neural networks is that they are not “inspectable.” What does this mean and why is it a drawback?

Answer: Because of the general structure of neural networks, it is difficult to determine what data features are most important in a particular neural network’s operation or why those features were chosen as important. Thus, neural networks are essentially “black boxes” and cannot be used to identify a particular decision strategy that can be used outside the network.

79. What is an inference engine?

Answer: An inference engine is the component of an expert system that applies a knowledge base to current data to arrive at a conclusion. In rule-based systems, the inference engine applies the rules to the current data and reports the result.

80. You are interested in building a decision support system that suggests a diagnosis in difficult glial tumor cases. The director of neuropathology, Dr Heinrich Neuromann, has offered to help you by imparting all of his wisdom on the subject. He gives you a list of findings and can tell you how he relates them to the diagnosis, how important each one is, and how the presence or absence of each finding weighs into his decision. Of the several types of expert systems we discussed, which would be most appropriate for this kind of problem?

Answer: A Bayesian belief network.

81. Most of the artificial intelligence systems we discussed rely on some kind of knowledge representation, with the notable exception of neural networks. Where is the “knowledge” in a neural network stored?

Answer: In the weightings between the nodes or “neurons.”

82. How are neural networks different than Bayesian belief networks along the following dimensions: (1) inspectability of knowledge, (2) need for probabilities acquired from “domain” experts, (3) need for data to train the system, and (4) ability of the system to make classifications based on input data. (Note: You may find it helpful to make a 2 × 4 table and include a short phrase or two in each cell.)

Answer: Bayesian belief networks are inspectable, known probabilities are required, training data are not needed, and they can classify into multiple categories.

Neural networks are not inspectable, they do not need domain expertise or known probabilities, training data are required, and they are best for a binary classification (“yes” or “no”).

83. What is the Arden syntax?

Answer: The Arden syntax is a standard language and format for representing the medical knowledge and algorithms required for making medical decisions. It is used in medical decision support systems.

84. You are working with an intensive care unit (ICU) attending physician on a project to see if you can predict readmission for patients with pancreatitis. You have access to a large database of ICU data (such as cardiac catheter values, vital signs, and respiratory parameters), as well as all of the data that can be gleaned from the LIS. There are approximately 800 measurements of various types for each of 4000 patients. You do not really have any specific ideas about what values would be most predictive; in fact, you think it is likely that the predictors are highly complex combinations of factors. Which of the 3 types of artificial intelligence systems would be most appropriate for this problem? Why?

Answer: A neural network is most appropriate, because there is no prior knowledge to allow selection of predictors, the relative weighting of predictors is unknown, a large data set of many discrete potential predictors is available, combinations of predictors may provide better discrimination than individual predictors, and the desired classification is binary (readmission likely or unlikely).

85. Rule-based systems underlie most clinical event monitors (programs that detect important clinical events and notify appropriate medical personnel). Often these systems work in conjunction with data from the clinical pathology LIS. What aspects of clinical pathology make a rule-based system a reasonable approach?

Answer: Clinical laboratory databases consist of many discrete test results that have known reference ranges and critical values. Well-established patterns of these results exist that are known to be related to important clinical conditions. Writing rules that detect and alert to these patterns is straightforward.

86. Artificial intelligence and data-mining systems often use “training data sets” and “test data sets.” Define these terms and describe briefly how these data sets are used.

Answer: Training data sets are given to systems initially to teach them to make correct responses. Test data sets are equivalent to the training sets but contain separate data and are used to verify the performance of the systems.

Programming and Development

87. What is the difference between a procedural and an event-driven computer program?

Answer: Procedural programs generally start when a defined set of data is input and then run through a set of sequential steps until they finish or until more data are needed. Event-driven programs cycle continuously, looking for any one of multiple events (mouse clicks, key presses, menu choices) that signal a particular path of processing.

88. Name 1 advantage of object-oriented programming.

Answer: Choose from (1) the program code is modular with classes that can be tested and verified as individual units, thus it can be debugged more easily and effectively; (2) well-designed classes are self-contained with fewer side effects and dependencies on other elements, thus are usually more reliable; and (3) classes can incorporate components of other classes in well-defined ways and can be incorporated into multiple programs, leading to greater code reuse and development efficiency.

89. List 2 advantages of open source software.

Answer: Choose from (1) lower initial cost, (2) not dependent on a single vendor and can be supported by users, (3) available source code means that problems can be reviewed and corrected by users, and (4) users can extend the code, providing locally needed functions.

90. In project development, what is the purpose of a “scope” document?

Answer: A scope document describes the environment in which software will be developed and defines in detail the required features of the finished product. It outlines, from the user’s perspective, all components that will be included in a project, but it generally does not specify a particular technical implementation. A scope document defines (generally in nontechnical terms) what will and will not be included in a project.

91. What happens when a computer program is compiled?

Answer: It is converted by a compiler program from textual source code to machine language that can be executed by the central processing unit.

92. Give an example of an object, a property it has, and a behavior it has.

Answer: The class discussed an example of a “balloon” object, with properties of “color” and “size,” and behaviors of “blow up” and “pop.” More practically, there might be a “pathology report” object with properties of “patient name” and “final diagnosis” (among others), and behaviors of “display” and “print.”

93. List 3 things one should do in evaluating enterprise software (eg, an AP information system) prior to purchase, other than comparing software features and price.

Answer: Choose from (1) develop a requirements document that clearly states and prioritizes local needs before beginning to review software, (2) consider the financial health and technical expertise of candidate vendor companies, (3) talk with several different customers of each vendor concerning their satisfaction with the product and service, (4) ask the vendor to provide an on-site demonstration of the software, and (5) visit at least 1 customer site for vendors being considered seriously, to evaluate the software in operation.

94. Define “high-level programming language” and give an example.

Answer: In high-level languages, many lines of machine language correspond to each line of source code. The source code is also typically more “English-like” than lower level languages. Basic functions such as memory management are handled automatically. This capability means that programs are easier and faster to write in high-level languages and are easier to debug. Examples include Perl, Python, Visual Basic, or Java (others are also acceptable).

95. What is the unified modeling language (UML)?

Answer: UML is a standard way to diagram the structure of programs, program functions, program-program interactions, and program-user interactions.


96. Joint photographic experts group (JPEG) and graphics interchange format (GIF) are alternative strategies for compressing image files. What types of images are each of these best suited for?

Answer: JPEG is best for continuous tone (photographic) images, whereas GIF is best for line art and other images that have large areas of constant color.

97. What determines the resolving power of a digital camera?

Answer: The magnifying power of the lens and the number of pixels in the camera’s detector.

98. In digital microscopy, which image characteristics are related by the modulation transfer function?

Answer: Contrast and resolution.

99. Which of the following does image compression take advantage of (circle all that are correct)?

a. Data redundancy

b. Data smog

c. Data modeling

d. Data irrelevance

Answer: a and d.

100. We often hear terms such as “a high-resolution 1600 × 1200 camera.” What does this mean and what, if anything, does this mean for the resolution of the images captured by this camera?

Answer: This means that the charge-coupled device (CCD) detector in the camera measures 1600 (horizontal) pixels by 1200 (vertical) pixels. This is approximately full size on a 21-inch monitor and will yield a reasonably sharp 5 × 7-inch print. It is much lower resolution than photographic film.

101. Describe the relationship between field of view, the microscope magnification, and the CCD size.

Answer: Field of View = CCD Size/Magnification.

102. What is a “digital slide” and how is it constructed?

Answer: A digital slide is a high-resolution digitized image of an entire microscope slide. It is constructed by “tiling” many small high-resolution images covering all the tissue on the slide.

Consumer Health Informatics

103. List 2 weaknesses in e-mail and Web-based consultations between physicians and patients as compared with face-to-face discussion.

Answer: Choose from (1) follow-up questions are more difficult, (2) medical complaints cannot be checked by physical examination, and (3) security cannot be assured.

104. What kind of information or interaction are health care consumers most interested in gaining from the Internet?

Answer: They would like questions they submit to be answered directly by health care workers, including physicians.

105. Some Internet sites that purport to offer health care information or online consultation are actually thinly disguised commercial enterprises. What are many of these sites actually selling?

Answer: Prescription drugs.

106. List 3 reasonable indicators consumers could use to evaluate the quality of health care Web sites and information on the Internet (without reading the content of the information as an expert)?

Answer: The site subscribes to a “good practices” code, such as the guidelines of the Health on the Net Foundation (HONCode); the site is managed by a trusted institution (eg, the National Institutes of Health), a national health organization (eg, American Medical Association or American Lung Association), or a school of medicine; the site is referenced from the site of a reputable organization or a trusted group, such as HealthWeb (http://healthweb. org/).

107. Describe 1 way that Web sites offering consumer health information can reassure visitors that they maintain high standards of accuracy?

Answer: Sites could subscribe to health care information quality standards, such as those developed by the Health on the Net Foundation (HONCode) or the Internet Healthcare Coalition eHealth Code of Ethics.

New Technologies

108. What is the difference between continuous and discrete speech recognition?

Answer: Discrete speech recognition requires a brief pause between words; continuous speech recognition allows natural speech rhythm without pauses.

109. In virtual-reality terms, describe what is meant by “augmented reality” and list an advantage of this approach and a possible application in health care.

Answer: Augmented reality refers to a combined view consisting of computer-generated graphics overlaid on the real world. This approach allows internal or other characteristics of real objects to be highlighted. In medicine, one application might be the visual overlay of a representation of internal anatomy (determined by ultrasound or computed tomography/magnetic resonance imaging) on a patient to aid in fine-needle aspiration of a small mass.

110. What is the difference between speech recognition and natural language recognition?

Answer: Speech recognition is automatic conversion of speech to text and can be discrete or continuous. Natural language recognition is the automatic detection of meaning in text and is used in programs like autoencoders.

Information Security

111. What is HIPAA and why has it had a major impact on the management of electronic medical records?

Answer: HIPAA is the Health Insurance Portability and Accountability Act, passed by Congress in 1996. The goals of HIPAA were to ensure health insurance portability, reduce fraud and abuse in health care, enforce standards for health information, and guarantee the security and privacy of health information. HIPAA-related guidelines for good practices in health information management, particularly in the areas of privacy and confidentiality of patient-identifiable information, are driving significant changes in the way that medical record information is processed, stored, and accessed in electronic systems.

112. List 1 requirement of the HIPAA regulations and briefly describe how it will affect health care information systems.

Answer: There are a number of choices. Two good ones would be (1) a requirement that patients be able to review and annotate their own medical records (raises technical challenges, including security concerns and how to manage and review annotations), or (2) a requirement that patients be able to see who has reviewed their records (requires comprehensive audit trails of who has looked at what and security such that people who need to know information can be identified and allowed to view it while others are excluded).

113. When discussing clinical information in the context of medical research, what is “deidentification”? What is “anonomization”? What is the difference between these terms?

Answer: Deidentified records are not directly traceable to patients by researchers, but an indirect mechanism exists through which data can be traced to a particular patient if necessary. Anonomized records have no identifying information and no mechanism for associating data with a particular patient.

114. Give 2 reasons why one might use deidentified rather than anonomized data in clinical research.

Answer: (1) In an appropriately designed research project, deidentified records from a particular patient can be linked over time without revealing the identity of the patient. This link allows longitudinal studies to follow the history of diseases or treatments. (2) In a study in which the results may have important implications for a patient’s health, deidentified data can be traced back to individual patients and their physicians if particular follow-up actions, counseling, or a change in therapy is necessary. Neither of these are possible with anonomized data.

115. What is the single greatest security risk facing any medical center today?

Answer: Inappropriate behavior by employees, such as discussing or viewing patient information in inappropriate settings, providing copies of patient information to unauthorized individuals, putting sensitive documents in the trash without shredding them, failing to log out when finished with a session, or giving their passwords to others.

116. What are the characteristics of a good-quality password for security purposes?

Answer: It should have at least 8 characters, not be a name or dictionary word, contain at least 1 digit and 1 punctuation mark, and not be obviously related to the user (eg, a street address would not be good).

117. Internet protocols like telnet and FTP have a particular security weakness when user authentication is based on standard account names and passwords. What is this weakness?

Answer: These protocols send the account name and password as clear text over the network at login time. If someone else is connected to the network and using network “sniffer” software to copy all messages passing across the network, the message log can be searched to find account names and passwords.

118. What does the term “Trojan horse” mean when applied to computer software?

Answer: Trojan horse software pretends to be an innocuous program, image, or document, but when it is run or opened, it carries out undesirable activities without the knowledge of the user. These activities could include deletion of files from the user’s computer, changing permissions and passwords on a user’s computer so others have access, installing software, or using the computer to attack other computers.

119. You are interested in evaluating a series of pathology reports in combination with other patient information to help standardize local pathology practice. According to HIPAA guidelines, what determines whether institutional review board (IRB) approval is required for your study?

Answer: Activities purely for local quality improvement do not need IRB approval. If the work is carried out with research funding, will provide preliminary data for a research funding (grant) application, or will be published in the research literature, it should be approved by the IRB.

120. What was the Belmont Report?

Answer: A 1979 government report that is considered the cornerstone of the ethical principles on which the federal regulations for the protection of human subjects used in research are based. Its main principles-justice, beneficence, and safety-are incorporated into IRB review practices.

121. Define loss of confidentiality; how is it different from loss of privacy?

Answer: Confidentiality refers to a situation in which information is provided to a second party or parties with the understanding that it will not be further distributed (ie, the information has a specific limited distribution). A loss of confidentiality occurs if that information is transmitted to others outside the restricted group. Loss of privacy occurs when personal information is transmitted to any second party, whether or not there is a confidentiality agreement.


122. There are about 40 000 human genes and ESTs (expressed sequence tags) available on the Affymetrix Gene Chips for gene expression. Make some reasonable assumptions and estimate, for a normal organ, the number of genes that will appear to be expressed significantly greater than or less than the population norm (95% confidence interval).

Answer: If we assume that each gene is independent and occurs in a normally distributed population, then one might expect approximately 5% of 40 000 or 2000 genes to appear overexpressed or underexpressed.

123. What is a “gene expression signature” for a tumor?

Answer: A collection of genes that are expressed consistently higher or lower in the tumor than is the population norm for nontumorous tissue of the same type.

124. What are the benefits of high-throughput expression analysis in molecular biological investigations?

Answer: These techniques allow simultaneous analysis of the expression of many genes. Patterns of increased or decreased expression associated with disease may then be identified and may contribute to improved diagnosis and prognosis.

Design and Usability Testing

125. To what 2 aspects of a usability assessment should the word “representative” apply?

Answer: (1) Representative tasks and (2) representative users.

126. Dr Jones is a pathologist who has recently developed a virtual microscope program for teenagers to use while they are hospitalized. He thinks that teens might use this program to learn more about their illness and the role that pathology plays in the hospital. He has heard that usability is important, so he asks a few of his colleagues to try the program out in their free time and to e-mail their comments to him. Describe 3 specific things that Dr Jones should have done differently in order to more appropriately assess the usability of his program.

Answer: (1) Have teens test the program, not physicians; (2) select specific representative tasks for the assessment (not just “try the program out”); and (3) observe the participants using the program and have them think aloud during the session rather than e-mail comments.

127. Name 2 principles of good design and give a brief description or example of each.

Answer: (1) Use natural mappings; if possible, have controls laid out in a way that matches corresponding items in the physical world (eg, controls for stovetop should match burner layout). (2) Speak the user’s language; that is, use terms and phrases that users will understand (eg, “5 names returned” vs “5 records returned,” “call displayed #” vs “call display”).

128. Give 2 reasons why user interfaces are difficult to design.

Answer: (1) It is difficult to think like someone other than yourself and, in most cases, the designer or programmer has a very different perspective than the user. (2) Designing an interface is an iterative process, requiring the development team to repeatedly redesign and test the interface with actual users.

The author thanks the following faculty and staff who contributed to the Pathology Informatics rotation for help in compiling and reviewing the question list: Gary Blank, PhD; Rebecca Crowley, MD, MS; John Gilbertson, MD, PhD; William Gross; John Houston; Valerie Monaco, PhD; and Michael Sendek.


1. Harrison JH, Stewart J. Training in pathology informatics: implementation at the University of Pittsburgh. Arch Pathol Lab Med. 2003;127:1019-1025.

Accepted for publication August 18, 2003.

From the Center for Pathology Informatics, University of Pittsburgh Medical Center, Pittsburgh, Pa.

Reprints: James H. Harrison, Jr, MD, PhD, Center for Pathology Informatics, University of Pittsburgh Medical Center, Cancer Pavilion, Third Floor, 5150 Centre Ave, Pittsburgh, PA 15232 (e-mail: jhrsn@

Copyright College of American Pathologists Jan 2004

Provided by ProQuest Information and Learning Company. All rights Reserved