Until recently, the practice of pathology has been entirely “human-driven”. Well-trained pathologists examine all tissues and arrive at diagnoses based on their application of learned criteria and experience. However, it is well-known that the accuracy of human interpretation can be hampered by subjectivity (inter-observer variability), inconsistency (intra-observer variability), and fatigue. Recently, the rise of digital methods in pathology has led to a growing interest in applying artificial intelligence (AI) to aid or even improve on the analysis of medical specimens. For the pathologist, AI has the potential to improve accuracy, productivity, and workflow by allowing the computer do what it does well: consume lots of data, recognize patterns, and perform automated analyses. Objective and reproducible specimen examination, along with
Let us explain first what we mean by “AI,” since the term is used liberally and sometimes synonymously with related concepts such as computer vision, machine learning, and deep learning. We define an AI algorithm as one in which the computer exhibits its own “intelligence” by “learning” the correct answer to a problem. In medical imaging, the problem is usually to identify (detect) and interpret (classify) different phenomena in the image data. Classifiers that use training data (data examples annotated as belonging to a set of classification categories) to identify mathematical discriminators that help determine what data belongs in what category all display a basic kind of AI. Often, simple classifiers rely on predefined features computed from the raw data as metadata to perform the classification, in which case artificial intelligence (feature-based classification) is helped by human intelligence (definition of the features). However, convolutional neural networks (
How to Set up an AI Workflow
There are several important steps involved in setting up an AI workflow. The first step is to define the input data into the system. For example, one might specify that the input images to be classified are rectangular sub-images of a certain size derived from an overlapping grid applied to a whole slide image (WSI). Second, input data should be divided into training data (used to train the classifier) and test data (used to evaluate the classifier.) Third, both training data and test data should be annotated by an expert, to establish a ground truth classification category for each input image. Fourth, the CNN architecture needs to be defined, most importantly the number and size of the convolutional layers that apply convolutional filters to derive the CNN features and the number and size of fully-connected layers that compute the weights that determine the final classification. Additional layers may be added in constructing the CNN, including pooling layers to combine information from previous layers, “activation” layers that introduce nonlinearity into the network to improve the convergence of training, and dropout layers that reduce overfitting. Fifth, the CNN should then be trained on the training data. This is the most time-consuming part of the process and results in a classifier that uses the annotated training data to determine what types of input images result in each classification category. Sixth, the trained CNN should be applied to the test data, to see how the CNN classifies each example in the test data set. And finally, the classification of the CNN on the test data should be evaluated by comparing against the ground truth annotations of the test data. For detection problems with only two classification categories (“present” or “absent”), evaluation is in terms of sensitivity (probability that the CNN declares “present” when the ground truth annotation is “present”) and specificity (probability that the CNN declares “absent” when the ground truth annotation is “absent”). When there are multiple classification categories, the evaluation takes the form of a results matrix (termed a confusion matrix) in which each row displays the probability of correct classification for a given category as well the probability of incorrect classification of that category in terms of all the other categories.