Sunday, 2 December 2007
And Finally
Anyway that's it. Top Gear is on in 30 mins and I'm off.
Sunday, 11 November 2007
Almost There
Saturday, 3 November 2007
Monday, 22 October 2007
Monday 22nd October
Began write up today. I will just carry on from the draft really as there wasn't much wrong with that. Word 2007 does make it look impressive. So even if it's crap it will look good.
Sunday, 21 October 2007
Sunday 20th
A few observations about these KBS:
Is there any point in putting three rules in for the grade as by inference if it’s not one then it’s one of the other two? Would two rules be enough
There are no cases in the training set with 18+ lymph nodes but this rule does exert an influence on the other cases. Do I remove it?
Currently modifying original KBS to test this.----------------
Now playing: The Avalanches - Frontier Psychiatrist
via FoxyTunes
Saturday, 20 October 2007
Saturday 19th October
I have in the mean time modified the certainty factors in the flex program to try to improve the performance but without success.
Will complete this practical work and begin writing up the results and the project this weekend.
T-25 days
----------------
Now playing: Super Furry Animals - Lazer Beam
via FoxyTunes
Friday, 19 October 2007
Friday 19th Arrrggghhhh!
C.F. : TRY : r8c
C.F. : LOOKUP : (grade is '3') = -1
C.F. : IMPLIES : cf(0.35) @ -1 -> -0.35
C.F. : LOOKUP : (prognosis is reccurence) = 0.2
C.F. : CONFIRMS : 0.2 + -0.35 -> -0.1875
C.F. : UPDATE : (prognosis is reccurence) = -0.1875
C.F. : FIRED : r8c
Looked at the certainty factor example in Chapter 3 of T396 and it doesn't simply add certainty factors together.
Start testing KBS again!
----------------
Now playing: Maximo Park - Going Missing
via FoxyTunes
Friday
I identified those cases which gave false +ve and -ve results and added them to the training sets but performance deteriorated. The data is ambiguous. Some patients have virtually the same sets of data but different outcomes. There are not enough data items to improve the performance - no ER/PR results etc.
Preprocessed the training set for the KBS. I couldn't get excel to output a .csv file so I had to produce it manually. I then copied these into word so that I could paste who strings into the console window of flex.
Today I have been running these and have noticed that rule 8c is not working properly as it is replacing the certainty from the previous rules and replaces it.
C.F. : TRY : r9c
C.F. : LOOKUP : (location is central) = -1
C.F. : LOOKUP : (grade is '1') = 1
C.F. : AND : -1 + 1 -> -1
C.F. : IMPLIES : cf(-0.5) @ -1 -> 0.5
C.F. : LOOKUP : (prognosis is reccurence) = 0.1
C.F. : CONFIRMS : 0.1 + 0.5 -> 0.55
C.F. : UPDATE : (prognosis is reccurence) = 0.55
C.F. : FIRED : r9c
C.F. : TRY : r8c
C.F. : LOOKUP : (grade is '3') = -1
C.F. : IMPLIES : cf(0.35) @ -1 -> -0.35
C.F. : UPDATE : (prognosis is recurrence) = -0.35
C.F. : FIRED : r8c
C.F. : TRY : r7c
C.F. : LOOKUP : (grade is '2') = -1
C.F. : IMPLIES : cf(0.1) @ -1 -> -0.1
C.F. : LOOKUP : (prognosis is recurrence) = -0.35
C.F. : CONFIRMS : -0.35 + -0.1 -> -0.415
C.F. : UPDATE : (prognosis is recurrence) = -0.415
C.F. : FIRED : r7c
I need to sort this before continuing.
Also if the CF.:CONFIRMS lines are supposed to be adding the numbers together they are wrong.
C.F. : TRY : r11c
C.F. : LOOKUP : (size is less_than_15) = 1
C.F. : IMPLIES : cf(-0.25) @ 1 -> -0.25
C.F. : LOOKUP : (prognosis is reccurence) = -0.5
C.F. : CONFIRMS : -0.5 + -0.25 -> -0.625 [-0.5+ -0.25 = -0.75]
C.F. : UPDATE : (prognosis is reccurence) = -0.625
C.F. : FIRED : r11c
Wednesday, 17 October 2007
Wednesday 17th October 11p.m.
The results of an MLP with 10 hidden layer neurons and 50000 training iterations were poorer than using the random set.
Next try a training set of all false +ves and -ves plus orginal training set.
Wednesday 17th October 11a.m.
First up complete neural network experiments. Not happy with the ones I've carried out before they are to ad hoc. I will organise them better this time.
So first create an MLP and work out the optimum number of training cycles. Do each experiment at least twice because NNs do not always cluster data the same way each time they are trained.
Score with score tool and calculate sensitvity, specificity and PPV.
Then experiment using different numbers of hidden layer neurons to find optimum.
Then, well I'll see how it's going.
----------------
Now playing: Jeff Wayne - Dead London
via FoxyTunes
Saturday, 22 September 2007
22nd September.
Tuesday, 4 September 2007
4th September
Wednesday, 15 August 2007
Wednesday 15th August
uncertainty_rule r4c
if the involved_nodes is '>=6 and <=17'
then the prognosis is reccurence
with certainty factor 0.20 .The equivalent input in the neural network will be given by whether the statement:
Involved node is greater than or equal to 6 but less than or equal to 17
is true or false.
Monday, 13 August 2007
Monday 13th August - Oops
Monday 13/08/07
I've actually started to enjoy using Flex. I've been modifying the program that uses certainty factors and added more rules (kbs 6 and 7) and have tested with a couple of patients and it seems to work fine. The certainties of the evidence which are used at the start of the program have caused much head scratching. For example:
All patients under 30 years old do not suffer recurrence. So I can asign a certainty of -1.0 to the rule for the certainty that the patient will suffer recurrence. But if the patient is over 30 the evidence then the statement "is the patient under 30" is definitely not true so the certainty factor in the starting statement entered in the console would be -1. Implying that recurrence must occur which isn't true, I think.
So I've fiddled with the certainty factors in the rules to try to overcome this problem and tested with -1 or 0 in the starting statement when the condition is not true.
I am actually finding the TMA a pain because I actually have some enthusiasm for Flex at the moment and want to get on, but I have to finish the TMA.
Saturday, 11 August 2007
Saturday 11th August
Histopathology
Volume 19 Issue 5 Page 403-410, November 1991
To cite this article: C.W. ELSTON, I.O. ELLIS (1991)pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up
Histopathology 19 (5), 403–410.
Need to look for it at work.
Friday, 10 August 2007
Friday 11th August
Will read through Hopgood and the example in block one of T396. I think I will need less rules with or staements in them.
TMA really needs to get moving tomorrow. Not sure how to tackle it. Too late to contact tutor now. Will do the best I can in the time I have.
Must state that all I want is to pass this project. After 7 years of OU I am utterly fed up with studying.
Football season kicks of tomorrow. The only time I will be away from this PC will be to watch the Hammers.
Wednesday, 8 August 2007
Wednesday 8th August
Spent some time modifying my flex program using certainty factors. It is almost working. Hopefully a little more time will sort it out. I will see if I can add to the 6 rules I have. If not I will test when it is working. Need to sort out testing and scoring strategy. Probably use a similar score tool as the NN but with less instances to test. Upto 50 patients would be enough.
Then I'll modify the NN so that it has the same data inputs as the flex program and test. That way I'll be testing like with like.
Probably resume this work now after the TMA.
Sunday, 5 August 2007
Later Sunday
Decided just to run the flex program with uncertainty rules only as I am sure I can get it to work, though it is cumbersome to test.
Noticed in flex manual that it states:
Given a rule:
rule1: if A & B then C
there are 3 potential areas for uncertainty.
- Uncertainty in data (how true are A and B)
- Uncertainty in the rule (how often does A and B imply C)
- Impreciseness in general
The first 2 can be handled using probabilities and the third using fuzzy logic.
As the main sources of uncertainty in this project is in the data - difficult to measure tumours accurately etc, and the rules, there are only four rules that are always true according to the statistical analysis of this dataset but these are probably not always true with other datasets. Therefore I was justified in using uncertainty rules to deal with these sources of uncertainty.
Sunday 5th August
Back to work.
Saturday, 21 July 2007
Saturday 21st July
I decided to use certainty factors mainly because I feel I can understand them and also because after reading Hopgood discovering they are for uncertain data and rules, not uncertain language.
Anyway off on holiday now. Will have to resume when I get back. I am feeling much more confident that I can get a reasonable flex program working.
Also started a draft of the project which is required for TMA03
Sunday, 15 July 2007
Sunday afternoon
Had a read through a tutorial with flex which explained that putting the numbers in between ' would treat them as atoms not as numbers. I have adjusted the code and the program now runs fine.
question involved_nodes
how many involved_nodes are there ? ;
choose from no_involved_nodes.
Sunday 15th July
I have switched my attention to the knowledge based system. I have been having trouble with Flex because I was using questions with the answers in groups but the answers were numbers and I couldn't get flex to treat them as just text for the purposes of these questions. I have therefore developed questions in a different way. I will test this shortly.
Monday, 9 July 2007
Monday 09/07/2007
Sunday, 1 July 2007
Sunday 1st July
Three weeks till hols so not much time.
Tuesday, 5 June 2007
Tuesday 5th June
Monday, 4 June 2007
Monday 4th June
Sunday, 3 June 2007
Sunday Evening
Now back to the TMA.
Sunday 3rd June
Started TMA write up yesterday. Should be OK.
Saved the training file as an MS-DOS text file from excel and this trains the MLP without error. However the test data saved in the same format does run but the output file does not contain data that can be interpreted. I will have to spend some time looking at this.
Wednesday, 30 May 2007
Wednesday 30th May
Tuesday, 29 May 2007
Tuesday 29th May
Sunday, 27 May 2007
Sunday 27th May
I will now carry on with this preprocessing which is rather laborious because I am doing it manually. I do not have the expertise to covert the raw data into the format required by the neural network any other way. There are likely to be mistakes in this conversion but I am trying to be as meticulous as I can.
Saturday, 19 May 2007
Saturday 19th May
For example there seems to be no significance between the location of the tumour in the breast and the chances of reccurence of the tumour, apart from a much lower incidence of reccurence if the tumour is located centrally. Therefore I will cut this down from 5 input neurons (one for each quadrant) to 2 input neurons (central or not). There are several other possible situations where the number of input neurons could be reduced.
I hope to very shortly complete the rules identification. I have four rules that are always true and the rest are mostly true so will need to be used with the certainty factors.
As an aside and worth remembering when discussing this project is that there is no indication of how long was it from when the tumour was removed to the assessment of whether there was reccurence or not, i.e. if the patients without reccurence were followed up a few years later would the tumour still not have reoccured.
Wednesday, 16 May 2007
16th May
Am spending so much time configuring the order comms system at work that I am less than keen to spend time using a computer when I get home. Still the weather's crap which helps.
Tuesday, 15 May 2007
15th May
Wednesday, 25 April 2007
Wednesday 25th April
Saturday, 21 April 2007
Saturday 21/04/07
It's been a really good day as I recieved my TMA result and was shocked at how good it was. I would have been happy with 50%. Anyway I mustn't let it go to my head. West Ham won too so great.
Tuesday, 3 April 2007
Tuesday 3rd April
Added link for background info about breast cancer prognosis
Sunday, 1 April 2007
Week beginning 25th March
I have decided to try to use the data from work if I can get the clinical outcomes because I always have the breast cancer data that I have down loaded as a backup if this isn't achieveable. I will have to be careful not to use patient identifiable data because of the Data Protection Act. Therefore I won't be able to use a date of birth field but age will be OK.
Downloaded several papers on breast cancer prognosis and artificial intelligence via IEEE Xplore. I am reading through these but generally they have used neural networks with some success and fuzzy logic in hybrid systems. I would like to use neural networks in a similar way although, providing I can use the data from work, I will use different prognostic indicators such as oestrogen and progesterone receptor status and her2 status which were not included in these papers. I will try training a multilayer perceptron and comparing this with a radial basis function network and/or Kohonen self organising network to look at supervised vs unsupervised learning. I will try using an one of these network that use unsupervised learning to cluster the data coupled to a MLP to classify the data clusters.
I will try using a rule based system to compare it's performance with that of the neural networks. I will try using fuzzy logic to produce rules based on analysing the data I have. For example: If number of lymph nodes is high and the tumour is large then the chances of reoccurence if high.
Now to sort out that TMA.
Monday, 26 March 2007
Monday 26/03/07
I immediately read a useful article on the New Scientist website which describes using a neural network and fuzzy logic to predict the disease spread and 5 year survival rates for patients presenting with various types and stages of breast cancer. Link added "Artificial Intelligence Tackles Breast Cancer"
I followed the link in the New Scientists site to the Biomedical Computing and Engineering Technologies Applied Research Group who produced the original paper. Here there is a list of very useful publications and journal articles. I will use the OUs library to see if I can access the full text of some of these articles.
References on site that may be useful:
NAGUIB, R.N.G. and SHERBET, G.V. (2001) Artificial neural networks in cancer diagnosis, CRC Press. Prognosis and Patient Management ISBN: 8493-9692
Seker H, Odetayo MO, Petrovic D and Naguib RNG. A Fuzzy Logic Based Method for Prognostic Decision Making in Breast and Prostate Cancers. IEEE Trans. Inform. Tech. Biomed., 2003, 7 (2): 114-122.
Seker H, Odetayo M, Petrovic D, Naguib RNG, Bartoli C, Alasio L, Lakshmi MS and Sherbet GV. Neuro-Fuzzy Rule-Based Intelligent Survival Analysis of Breast Cancer Patients Using Histological and Image Cytometric Prognostic Factors. Proc. Am. Assoc. for Cancer Res., Washington, DC, USA, 2003, 44: 4513.
Also discovered this article with is very interesting and provides background on who prognosis is currently derived.
Matt Williams and Jon Williams Combining Argumentation and Bayesian Nets for
Breast Cancer Prognosis (Draft) May 2005
Sunday 25/03/07
Saturday, 24 March 2007
Saturday 24th March
Used google to search for "Data Classification" and "Survey Classification". I added a link to a paper on classification. I also searched for "Data Mining" which took me to Wikipedia again. Added that link too. I've ruled out my plan to use the RC Path workload idea because I will need to rely on the pathologists to score the requests. I don't think I can trust them to do it.
I am going to pop into WH Smiths to see what data is in the back of the Hi-Fi mags etc. The Top Gear magazine has lots of data about cars in the back. I have a few ideas using that.
Sunday, 18 March 2007
Sunday PM
Added some more links. Must do some more reading and find the CD-ROM.
Sunday 17/03/07
- Specimen Type e.g. Breast Biopsy, Skin Biopsy etc. Approx 250 different types.
- Levels - Are stains at various levels through the tissue block required. The more slides cut the longer it takes to examine them all.
- Number of Special Stains Requested
- Number of Immunocytochemistry Slides
- 2nd Pathologists Opinion Required - Usually suspected cancer cases
- Minimum Dataset Completed
- Presentation at Multi Disciplinary Team Meeting
- Number of Specimens - Each case may consist of more that one specimen. The greater the number of specimens the greater the workload. The workload could be measured per specimen but the RCPath Guidelines suggest a workload per case with each specimen contributing to the overall score. I have chosen just to use the most major specimen type for each case as this has the greatest contribution to the overall work required to report the case. I will add some new specimen types to help deal with cases where we get multiple pots with a low complexity but because of their number this makes them more time consuming.
- Number of blocks - Generally the number of blocks represents the amount of sampling the specimen(s) require. Complex cases usually require greater numbers of blocks to be taken. Each block produces a minimum of one slide for the pathologist to examine.
- Extra H&E slides - A pathologist may request further levels on a block if the case is difficult to report.
- Pathologist/BMS Cut Up - Simple cases are often described and sampled by BMS staff and are only microscopically examined by the pathologist.
I aim to get the pathologists to assign the workload to their cases following the RC Path guidelines. Then extract the all the data from the lab system and add the supplementary data field to each case (Pathologist/BMS Cut Up?) Process the data to a form acceptable to a NN. Train the network with the data including the workweight assigned by the pathologist, then test with data that doesn't include the assigned workweight and compare this to the workweight assigned by the pathologists. The lab number can be used to compare the workweight from the NN to those from the pathologist. I would use a NN with 11 input neurons for the 11 data items and 10 output neurons for each workweight.
Problems:
- RCPath Guidelines, for want of a better description, are very woolly. I think a NN due to their black box nature can cope with the lack of clarity in the guidelines. However a KBS needs rules to be able to classify each case, and rules are difficult to create from this data. Handling uncertainty - fuzzy logic etc - is one way to cope with this but I think writing the rules in the first place would be difficult. I would probably have to analyse all the data and assign average scores to each data item - possibly, maybe. I think a KBS might be useful to preprocess the specimen type data before passing it to the NN.
- Bias - I think the Pathologist will over score the workload. They are not going to admit to being anything other than overworked
- Other factors that add to the pathologsits workload that aren't measured - e.g. Training of Junior Staff.
Last Week
I've read through the study guide again. I can't find the CD ROM it refers to. It must be here somewhere. There's too much junk by the PC for things to get lost in.
TMA01 deadline looming so I've booked time off work before this to help. It's very difficult to concentrate when the kids are bickering.
Monday, 5 March 2007
Monday Lunchtime
Sunday, 4 March 2007
Late Sunday 4th March
More Thoughts
I do have experience of doing a large project from my Fellowship of the Institute of Biomedical Science. I am pretty good a writing well and producing reports. Certainly as far as the OU goes what my reports lack is substance! My knowledge of computing isn't the best, so I'll have to try harder.
Having read the study guide I realise that doing a project solely based around neural networks isn't enough. I am considering using a knowledge based system to in effect preprocess some of the data required into a form that a neural network can use, so in effect a hybrid system.
I have downloaded the Royal College of Pathologists publication Guidelines on Staffing and Workload for Histopathology and Cytopathology Departments (2nd Edition) 2005. I will read through this to see how practicle it is to calculate workload for histology specimens from the data in our labs computer system. On first glance it seems somewhat ambitious. I plan to look at the suggested online databases from the study guide.
I must email my tutor to tell him of the existance of the blog so that he can comment if he wants to.
A hectic weekend and seeing West Ham lose in injury time has not put me in the best of moods.
Monday, 26 February 2007
Monday 26/02/07
Sunday 25/02/2007
Opening Thoughts
The aim of this log is to show how I have developed this project from the beginning to the final document. I have an aversion to planning work properly so I will have to remain determined to write my thoughts in this log regularly. Up till now I have done some reading around AI techniques particularly Neural Networks. My experience of T396 was that I found Flex, the rule based system difficult to use but was much happier using the Neural Works. Therefore I really want to base this project around Neural Networks (NNs). So I want this project to look at the feasibility of using NNs in a particular situation rather than the more obvious project of comparing Rule Based Systems with NNs in a particular situation. That is my aim however I may find I need to consider Rule Based Systems as the project is implemented.
I have given a great deal of thought to what situation I will apply the NNs to. I have looked at one example project on the OU’s web site and read an interesting article about using a Kohonen Network to sort photos according to what is actually in the photo, rather like Adobe Photoshop. However I would really like to use data from work. The problem here though is getting hold of clinical outcomes of patients due to patient confidentiality. I would like to get results from various tests from Pathology, Radiography etc and see if an AI system can predict the clinical outcome for the patient. Cancer treatment seems to be a good area. If I can find out details of the disease that was diagnosed, the histology and immunocytochemistry results, patients age, sex, treatment eg chemo/radiotherapy and the five year survival rates for the disease it may be possible to predict the likely outcome for undiagnosed patient. That said this may be too obvious e.g. if the tumour has shows metastasis the patient will probably die, if there is no metastasis they will probablyl survive. I may have to look at a specific grade of disease. This is rather like the oiled seabird project in the ETMA for T326. An AI system like this if it worked, would be able to predict which patients are likely to survive and so who to utilise valuable resources trying to save. In practice treatment is never as cut and dried as this. The more I think about this the less convinced I am that it would be any good.
About Me
- Rob
- My goal in life is to become grumpier. There's no point getting older unless you become grumpier. Working for the NHS helps as does supporting West Ham, so one day I'll end up like Victor Meldrew.