Patient samples
All patients participating in this study did so with their full informed consent. Peripheral blood samples were obtained from 20 patients attending the Leeds Head and Neck multidisciplinary clinic. All patients were new referrals to the clinic, although this would also include patients who had previously undergone resection of cancer and had re-presented with recurrence. Patients had tumours from the head and neck; tongue, larynx, skin, and salivary glands. The tumours were of various type, grade and stage.
Control samples were obtained from patients attending the Leeds Chest Clinic. These patients suffered a range of respiratory diseases including asthma, chronic obstructive pulmonary disease (COPD), and bronchiectasis. Patients with a history of cancer or ongoing treatment for cancer were excluded from the study.
Ethics
Ethical opinion was sought and granted from Leeds West ethical committee. Ref: 06/Q1205/226.
2 blood samples were collected from each patient in specialist blood tubes containing 3.2% sodium citrate. Samples were centrifuged immediately at 3000 rpm for approximately 10 minutes. The supernatant (plasma) was removed and split into 4 cryovials. These vials were placed in a -80°C freezer for storage until use.
Raman spectroscopy
The Raman spectra were obtained from all samples using a Renishaw 'System 1000' Raman microscope. Excitation was provided by a Sacher Lasertechnik Littrow external cavity laser set at 783 nm. Detection of the Raman scattered light was via a Renishaw RenCam NIR enhanced thermoelectrically cooled CCD camera. The spectrometer was coupled to a Leica DMLM microscope; and the exciting light is delivered to the sample, and the scattered light collected from the sample, via a 50 times Leica microscope objective. The spectrometer used holographic notch filters to remove Rayleigh scattered light from the collected light. The Raman scattered light was then dispersed across the CCD array detector by a single stage, 250 mm focal length grating spectrometer. The microscope was equipped with a motorised XYZ positioning stage (Prior) with integrated position sensors on the X and Y axes (Renishaw). Instrument control and data collection were performed with Renishaw WiRE software which operates within Galactic GRAMS software.
Raman measurements
Individual samples were each thawed and pipetted onto a quartz microscope slide. This was allowed to air dry before Raman spectroscopy measurements were taken. An extended spectrum reading from 600 - 1800 nm was recorded. The time to acquire the spectral reading was 20 seconds for all samples. The microscope lens was a 50 times air objective with a 0.75 numerical aperture. Ten spectra were obtained from each plasma sample. The 10 spectra could then be converted to a mean spectrum for each sample. The plasma samples for the cancer and non-cancer patients were run alternate so as to rule out any possible influence of time of day and machine variability.
Data normalisation
The intensity of each Raman spectrum not only depends on sample characteristics but also on operating equipment. The equipment currently used was un-calibrated as an experimental calibration protocol has yet to be devised. The raw data was normalised in order to compare data across samples as follows;
-
The data was re-sampled so that the measurements all correspond to the same Raman shift points (601, 602,,1800) by interpolation between the closest points.
-
Each spectra is normalised so that the area under the curve between a Raman shift of x = 700 and 1400 is equal. The normalisation factor is based just on this section as much greater variations arise in spectra outside of this range; however the same multiplicative factor was used on the entire spectrum for each sample. The normalisation process appeared to align the spectra intensities well. The actual area under the whole curve varied between 1.6 and 1.9.
-
The data was smoothed with a Gaussian window function (a weighted linear filter) to smooth out some of the noise effects. The weights used were [0.0146 0.0831 0.2356 0.3333 0.2356 0.0831 0.0146 ]. Due to the measurement techniques used, we expected the spectra to be locally smooth, and this procedure helped to reduce the measurement noise seen in the data.
-
A mean spectra for each patient is then produced which can be used for the analysis.
Classification
Raman spectra are high-dimensional data sets. Each spectra contains 1200 Raman shift intensities. We wanted to be able to distinguish between cancer and non-cancer (respiratory) samples, from the mean spectra for each patient. Often it is useful to reduce the dimensionality of the data, and here we considered 3 ways of doing this: (1) by choosing the 25 best features from the 1200 data points using a two sample t-test to identify good features for the binary classification task, (2) using principal component analysis (PCA), to map the data to a lower dimensional space, here with 25 components, and (3) using a combination of the two, first choosing the 100 best features, and then PCA to reduce to 25. For discrimination between cancer and non-cancer cohorts LDA was performed on the reduced spectra. To evaluate performance, ten-fold cross-validation was performed, 100 times and an average performance calculated to avoid bias in the partition.
Genetic algorithm
The mean spectra were provided as input sequences to the Implicit Context Representation Cartesian Genetic Programming algorithm (IRCGP)[14, 15]. IRCGP uses evolutionary computing methodology to learn classifiers that are capable of distinguishing between data classes. Induced classifiers take the form of programmatic expressions applied to particular offsets within the input data sequences. These expressions are composed from a set of simple mathematical functions. Both the choice and connectivity of the functions, and the choice of offsets used within the input sequences, are determined by the algorithm's evolutionary process. The input sequences were divided equally into training and test sets. To prevent over-learning, training of the classifiers was stopped once classification accuracy of the test sequences started to fall.