Decided to search for examples of AI in cancer prognosis for the literature search part of TMA01. I have started with a simple google search for the terms: AI cancer prognosis.
I immediately read a useful article on the New Scientist website which describes using a neural network and fuzzy logic to predict the disease spread and 5 year survival rates for patients presenting with various types and stages of breast cancer. Link added "Artificial Intelligence Tackles Breast Cancer"
I followed the link in the New Scientists site to the Biomedical Computing and Engineering Technologies Applied Research Group who produced the original paper. Here there is a list of very useful publications and journal articles. I will use the OUs library to see if I can access the full text of some of these articles.
References on site that may be useful:
NAGUIB, R.N.G. and SHERBET, G.V. (2001) Artificial neural networks in cancer diagnosis, CRC Press. Prognosis and Patient Management ISBN: 8493-9692
Seker H, Odetayo MO, Petrovic D and Naguib RNG. A Fuzzy Logic Based Method for Prognostic Decision Making in Breast and Prostate Cancers. IEEE Trans. Inform. Tech. Biomed., 2003, 7 (2): 114-122.
Seker H, Odetayo M, Petrovic D, Naguib RNG, Bartoli C, Alasio L, Lakshmi MS and Sherbet GV. Neuro-Fuzzy Rule-Based Intelligent Survival Analysis of Breast Cancer Patients Using Histological and Image Cytometric Prognostic Factors. Proc. Am. Assoc. for Cancer Res., Washington, DC, USA, 2003, 44: 4513.
Also discovered this article with is very interesting and provides background on who prognosis is currently derived.
Matt Williams and Jon Williams Combining Argumentation and Bayesian Nets for
Breast Cancer Prognosis (Draft) May 2005
Monday, 26 March 2007
Sunday 25/03/07
Just over a week to go till the TMA has to be submitted. I read through the TMA again and realised I still have a lot to do and really haven't planned as well as I should. I have downloaded the breast cancer data from the Machine Learning website and put it into MS Access. I have written some queries to look at each of the attributes in the table to see how they correlate with classification of tumour recurrance or no recurrance. No obvious patterns seem to emerge from the data. I therefore think that which ever AI techniques I use they must be able to handle uncertainty. It won't be possible to say patient x's tumour will reoccur but it will be possible to give the proabaility that it will reoccur.
Saturday, 24 March 2007
Saturday 24th March
Recieved replacement CD yesterday but was sent the wrong one - Health & Social Sciences. I will have to ask again - more time wasted.
Used google to search for "Data Classification" and "Survey Classification". I added a link to a paper on classification. I also searched for "Data Mining" which took me to Wikipedia again. Added that link too. I've ruled out my plan to use the RC Path workload idea because I will need to rely on the pathologists to score the requests. I don't think I can trust them to do it.
I am going to pop into WH Smiths to see what data is in the back of the Hi-Fi mags etc. The Top Gear magazine has lots of data about cars in the back. I have a few ideas using that.
Used google to search for "Data Classification" and "Survey Classification". I added a link to a paper on classification. I also searched for "Data Mining" which took me to Wikipedia again. Added that link too. I've ruled out my plan to use the RC Path workload idea because I will need to rely on the pathologists to score the requests. I don't think I can trust them to do it.
I am going to pop into WH Smiths to see what data is in the back of the Hi-Fi mags etc. The Top Gear magazine has lots of data about cars in the back. I have a few ideas using that.
Sunday, 18 March 2007
Sunday PM
Really not sure about this project propsal so far. I am going to download the breast cancer data from the machine learning repository and have a look at this.
Added some more links. Must do some more reading and find the CD-ROM.
Added some more links. Must do some more reading and find the CD-ROM.
Sunday 17/03/07
I read through the Royal College of Pathologists Workload Guidelines again trying to pick out data fields that contribute to the workload and that are already captured on the lab system. I decided that there are 7 data items currently held in the lab system WinPath that the RCPath Workload Guidelines state are factors contributing to the overall workload of the pathologist. These are:
I aim to get the pathologists to assign the workload to their cases following the RC Path guidelines. Then extract the all the data from the lab system and add the supplementary data field to each case (Pathologist/BMS Cut Up?) Process the data to a form acceptable to a NN. Train the network with the data including the workweight assigned by the pathologist, then test with data that doesn't include the assigned workweight and compare this to the workweight assigned by the pathologists. The lab number can be used to compare the workweight from the NN to those from the pathologist. I would use a NN with 11 input neurons for the 11 data items and 10 output neurons for each workweight.
Problems:
- Specimen Type e.g. Breast Biopsy, Skin Biopsy etc. Approx 250 different types.
- Levels - Are stains at various levels through the tissue block required. The more slides cut the longer it takes to examine them all.
- Number of Special Stains Requested
- Number of Immunocytochemistry Slides
- 2nd Pathologists Opinion Required - Usually suspected cancer cases
- Minimum Dataset Completed
- Presentation at Multi Disciplinary Team Meeting
- Number of Specimens - Each case may consist of more that one specimen. The greater the number of specimens the greater the workload. The workload could be measured per specimen but the RCPath Guidelines suggest a workload per case with each specimen contributing to the overall score. I have chosen just to use the most major specimen type for each case as this has the greatest contribution to the overall work required to report the case. I will add some new specimen types to help deal with cases where we get multiple pots with a low complexity but because of their number this makes them more time consuming.
- Number of blocks - Generally the number of blocks represents the amount of sampling the specimen(s) require. Complex cases usually require greater numbers of blocks to be taken. Each block produces a minimum of one slide for the pathologist to examine.
- Extra H&E slides - A pathologist may request further levels on a block if the case is difficult to report.
- Pathologist/BMS Cut Up - Simple cases are often described and sampled by BMS staff and are only microscopically examined by the pathologist.
I aim to get the pathologists to assign the workload to their cases following the RC Path guidelines. Then extract the all the data from the lab system and add the supplementary data field to each case (Pathologist/BMS Cut Up?) Process the data to a form acceptable to a NN. Train the network with the data including the workweight assigned by the pathologist, then test with data that doesn't include the assigned workweight and compare this to the workweight assigned by the pathologists. The lab number can be used to compare the workweight from the NN to those from the pathologist. I would use a NN with 11 input neurons for the 11 data items and 10 output neurons for each workweight.
Problems:
- RCPath Guidelines, for want of a better description, are very woolly. I think a NN due to their black box nature can cope with the lack of clarity in the guidelines. However a KBS needs rules to be able to classify each case, and rules are difficult to create from this data. Handling uncertainty - fuzzy logic etc - is one way to cope with this but I think writing the rules in the first place would be difficult. I would probably have to analyse all the data and assign average scores to each data item - possibly, maybe. I think a KBS might be useful to preprocess the specimen type data before passing it to the NN.
- Bias - I think the Pathologist will over score the workload. They are not going to admit to being anything other than overworked
- Other factors that add to the pathologsits workload that aren't measured - e.g. Training of Junior Staff.
Last Week
Not much progess. Work commitments are difficult to manage especially now that the Order Communications Project seems to have become important again.
I've read through the study guide again. I can't find the CD ROM it refers to. It must be here somewhere. There's too much junk by the PC for things to get lost in.
TMA01 deadline looming so I've booked time off work before this to help. It's very difficult to concentrate when the kids are bickering.
I've read through the study guide again. I can't find the CD ROM it refers to. It must be here somewhere. There's too much junk by the PC for things to get lost in.
TMA01 deadline looming so I've booked time off work before this to help. It's very difficult to concentrate when the kids are bickering.
Monday, 5 March 2007
Monday Lunchtime
I've been searching google using "database repository" as the search criteria and came across the BioMed Central Databases site. There are many databases of diseases listed. I looked at a couple of cancer ones but access to these was restricted. I will have a further look at this when I get home. I've added the link to the main site.
Sunday, 4 March 2007
Late Sunday 4th March
After having looked at the two database repositories suggested in the study guide I was pleased to see some databases containing medical data. I find this sort of thing interesting. I have enough data at work to use but I lack the clinical outcomes needed which would be the classes that all the other attributes use to calculate. I will look at our Royal College of Pathologists Minimum Dataset data at work to see if there is any milage in that but if none of my ideas work I think the Breast Cancer data in the UCI repository of machine learning databases would offer me a flaaback which my tutor says I must have in case he feels my project is not suitable
More Thoughts
So my biggest challenge on this project is going to be planning. I just don't work in an organised way. I'm sure I would probably get better results but I just don't stick to plans so I don't make them. I realise I am going to have to plan this properly.
I do have experience of doing a large project from my Fellowship of the Institute of Biomedical Science. I am pretty good a writing well and producing reports. Certainly as far as the OU goes what my reports lack is substance! My knowledge of computing isn't the best, so I'll have to try harder.
Having read the study guide I realise that doing a project solely based around neural networks isn't enough. I am considering using a knowledge based system to in effect preprocess some of the data required into a form that a neural network can use, so in effect a hybrid system.
I have downloaded the Royal College of Pathologists publication Guidelines on Staffing and Workload for Histopathology and Cytopathology Departments (2nd Edition) 2005. I will read through this to see how practicle it is to calculate workload for histology specimens from the data in our labs computer system. On first glance it seems somewhat ambitious. I plan to look at the suggested online databases from the study guide.
I must email my tutor to tell him of the existance of the blog so that he can comment if he wants to.
A hectic weekend and seeing West Ham lose in injury time has not put me in the best of moods.
I do have experience of doing a large project from my Fellowship of the Institute of Biomedical Science. I am pretty good a writing well and producing reports. Certainly as far as the OU goes what my reports lack is substance! My knowledge of computing isn't the best, so I'll have to try harder.
Having read the study guide I realise that doing a project solely based around neural networks isn't enough. I am considering using a knowledge based system to in effect preprocess some of the data required into a form that a neural network can use, so in effect a hybrid system.
I have downloaded the Royal College of Pathologists publication Guidelines on Staffing and Workload for Histopathology and Cytopathology Departments (2nd Edition) 2005. I will read through this to see how practicle it is to calculate workload for histology specimens from the data in our labs computer system. On first glance it seems somewhat ambitious. I plan to look at the suggested online databases from the study guide.
I must email my tutor to tell him of the existance of the blog so that he can comment if he wants to.
A hectic weekend and seeing West Ham lose in injury time has not put me in the best of moods.
Subscribe to:
Comments (Atom)
About Me
- Rob
- My goal in life is to become grumpier. There's no point getting older unless you become grumpier. Working for the NHS helps as does supporting West Ham, so one day I'll end up like Victor Meldrew.