Sunday, 2 December 2007

And Finally

This may be my last blog entry as I am just completing the last question of TMA04. The poster is done. I hope it is OK but it's too late to worry about now. I did it on pre-processing which may not be technical enough.

Anyway that's it. Top Gear is on in 30 mins and I'm off.

Sunday, 11 November 2007

Almost There

Looks nice. Have no idea how good it is. I'm not that impressed. I keep thinking about things I could have done. Will print out final version this evening

Saturday, 3 November 2007

Saturday 3rd November

Continuing to write up. Taking longer than I would like

Monday, 22 October 2007

Monday 22nd October

Last night I finished all the practical work. I managed to test a MLP with the same inputs as the KBS. I tried various certainty factors and rules with the KBS to improve performance but it could not match the MLP. At least this is a direct comparison of KBS vs NNs which all the other experiments are not.

Began write up today. I will just carry on from the draft really as there wasn't much wrong with that. Word 2007 does make it look impressive. So even if it's crap it will look good.

Sunday, 21 October 2007

Sunday 20th

Rechecked scoring of KBS experiments and corrected a few mistakes. Both experiment scoring the same though PPVs are different.

A few observations about these KBS:

Is there any point in putting three rules in for the grade as by inference if it’s not one then it’s one of the other two? Would two rules be enough

There are no cases in the training set with 18+ lymph nodes but this rule does exert an influence on the other cases. Do I remove it?

Currently modifying original KBS to test this.

----------------
Now playing: The Avalanches - Frontier Psychiatrist
via FoxyTunes

Saturday, 20 October 2007

Saturday 19th October

A rare weekend day with not a great deal to do. So more OU. Not much more to do now. I completed testing and scoring my flex program yesterday. I built a MLP last night and tested the same training file as I have been using to test the flex program, however I did this with the full 32 input neurons. The MLP scored very well but it will be interesting to see how it scores with the same number of inputs as the flex program. This is what I am doing now. I am modifying the KBS input files for the MLP.

I have in the mean time modified the certainty factors in the flex program to try to improve the performance but without success.

Will complete this practical work and begin writing up the results and the project this weekend.

T-25 days

----------------
Now playing: Super Furry Animals - Lazer Beam
via FoxyTunes

Friday, 19 October 2007

Friday 19th Arrrggghhhh!

Tried copying the line "prognosis is reccurence" from rule r1c to everywhere it occured in the other rules etc. Then recompiled and tried again. Rule r8c now working properly. Cannot see why this happened. See below:

C.F. : TRY : r8c
C.F. : LOOKUP : (grade is '3') = -1
C.F. : IMPLIES : cf(0.35) @ -1 -> -0.35
C.F. : LOOKUP : (prognosis is reccurence) = 0.2
C.F. : CONFIRMS : 0.2 + -0.35 -> -0.1875
C.F. : UPDATE : (prognosis is reccurence) = -0.1875
C.F. : FIRED : r8c

Looked at the certainty factor example in Chapter 3 of T396 and it doesn't simply add certainty factors together.

Start testing KBS again!

----------------
Now playing: Maximo Park - Going Missing
via FoxyTunes

Friday

Yesterday I did a number of experiments with NNs. I couldn't get a score of more than 51% for any network topography despite adjusting training cycles etc. The performance of many of the networks was remarkably similar.

I identified those cases which gave false +ve and -ve results and added them to the training sets but performance deteriorated. The data is ambiguous. Some patients have virtually the same sets of data but different outcomes. There are not enough data items to improve the performance - no ER/PR results etc.

Preprocessed the training set for the KBS. I couldn't get excel to output a .csv file so I had to produce it manually. I then copied these into word so that I could paste who strings into the console window of flex.

Today I have been running these and have noticed that rule 8c is not working properly as it is replacing the certainty from the previous rules and replaces it.

C.F. : TRY : r9c
C.F. : LOOKUP : (location is central) = -1
C.F. : LOOKUP : (grade is '1') = 1
C.F. : AND : -1 + 1 -> -1
C.F. : IMPLIES : cf(-0.5) @ -1 -> 0.5
C.F. : LOOKUP : (prognosis is reccurence) = 0.1
C.F. : CONFIRMS : 0.1 + 0.5 -> 0.55
C.F. : UPDATE : (prognosis is reccurence) = 0.55
C.F. : FIRED : r9c

C.F. : TRY : r8c
C.F. : LOOKUP : (grade is '3') = -1
C.F. : IMPLIES : cf(0.35) @ -1 -> -0.35
C.F. : UPDATE : (prognosis is recurrence) = -0.35
C.F. : FIRED : r8c

C.F. : TRY : r7c
C.F. : LOOKUP : (grade is '2') = -1
C.F. : IMPLIES : cf(0.1) @ -1 -> -0.1
C.F. : LOOKUP : (prognosis is recurrence) = -0.35
C.F. : CONFIRMS : -0.35 + -0.1 -> -0.415
C.F. : UPDATE : (prognosis is recurrence) = -0.415
C.F. : FIRED : r7c

I need to sort this before continuing.

Also if the CF.:CONFIRMS lines are supposed to be adding the numbers together they are wrong.

C.F. : TRY : r11c
C.F. : LOOKUP : (size is less_than_15) = 1
C.F. : IMPLIES : cf(-0.25) @ 1 -> -0.25
C.F. : LOOKUP : (prognosis is reccurence) = -0.5
C.F. : CONFIRMS : -0.5 + -0.25 -> -0.625 [-0.5+ -0.25 = -0.75]
C.F. : UPDATE : (prognosis is reccurence) = -0.625
C.F. : FIRED : r11c

Wednesday, 17 October 2007

Wednesday 17th October 11p.m.

A day of testing NNs. All the MLPs I tested regardless of the number of training iterations or hidden layer neurons gave very similar results. So I analysed the results of Exp E, as it was fairly representative, for false +ves and -ves. I then produced a new training file of some of these and the original random cases but still with 50 cases.

The results of an MLP with 10 hidden layer neurons and 50000 training iterations were poorer than using the random set.

Next try a training set of all false +ves and -ves plus orginal training set.

Wednesday 17th October 11a.m.

More experimenting. Three days to complete the experiments and then a couple of weeks to write up. Well that's the plan anyway.

First up complete neural network experiments. Not happy with the ones I've carried out before they are to ad hoc. I will organise them better this time.

So first create an MLP and work out the optimum number of training cycles. Do each experiment at least twice because NNs do not always cluster data the same way each time they are trained.

Score with score tool and calculate sensitvity, specificity and PPV.

Then experiment using different numbers of hidden layer neurons to find optimum.
Then, well I'll see how it's going.

----------------
Now playing: Jeff Wayne - Dead London
via FoxyTunes

Saturday, 22 September 2007

22nd September.

Tried a Kohonen SOM during the week with the defaults from NeuralWorks and using the entire 285 cases. With the score tool it score 51% which is higher than any MLP but those were only trained with the training set. I will rerun both networks later to ensure I'm comparing like with like. Then I'll compare this with my KBS.

Tuesday, 4 September 2007

4th September

Well time to get on after a couple of weeks doing sod all. Really surprised and pleased by the result for TMA03. I'm not entirely sure which task to do next. I'll probably do some more preprocessing of the data to enable me to test the flex program and score the output. I think this will be quite labourious so I'd better do it sooner rather than later.

Wednesday, 15 August 2007

Wednesday 15th August

I'm making this decision as I type it into TMA03. I mentioned on the blog before that I wanted to compare like with like so I wanted to construct a neural network with the same number of inputs as the flex program. So for a rule inthe flex program such as:

uncertainty_rule r4c

if the involved_nodes is '>=6 and <=17'

then the prognosis is reccurence

with certainty factor 0.20 .

The equivalent input in the neural network will be given by whether the statement:

Involved node is greater than or equal to 6 but less than or equal to 17

is true or false.

Monday, 13 August 2007

Monday 13th August - Oops

Noted when writting up the draft for TMA03 that I have not mentioned that I decided to use Positive Predictive Value, Specifity and Sensitivity as performance indicators as these are more universally understood than the OU score tool.

Monday 13/08/07

Spent most of yesterday writing Q2 of TMA03. I'm not sure I entirely understand what is menat by the "doing" part of the project but I've done what I can, bearing in mind the practical work is far from repeat.

I've actually started to enjoy using Flex. I've been modifying the program that uses certainty factors and added more rules (kbs 6 and 7) and have tested with a couple of patients and it seems to work fine. The certainties of the evidence which are used at the start of the program have caused much head scratching. For example:

All patients under 30 years old do not suffer recurrence. So I can asign a certainty of -1.0 to the rule for the certainty that the patient will suffer recurrence. But if the patient is over 30 the evidence then the statement "is the patient under 30" is definitely not true so the certainty factor in the starting statement entered in the console would be -1. Implying that recurrence must occur which isn't true, I think.

So I've fiddled with the certainty factors in the rules to try to overcome this problem and tested with -1 or 0 in the starting statement when the condition is not true.

I am actually finding the TMA a pain because I actually have some enthusiasm for Flex at the moment and want to get on, but I have to finish the TMA.

Saturday, 11 August 2007

Saturday 11th August

Oh well so much for the new season. Some things don't change. Found a useful, if rather old reference for prognostic indicators in breast cancer.

Histopathology

Volume 19 Issue 5 Page 403-410, November 1991

To cite this article: C.W. ELSTON, I.O. ELLIS (1991)
pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up
Histopathology 19 (5), 403–410.

Need to look for it at work.

Friday, 10 August 2007

Friday 11th August

Modified flex certainty factor program. Program compiles and runs but the results are not as expected. The certainty factor associated with the prognosis is adjusted after each rule so that the certainty factor is the sum after all the rules are evaluated but if all patients under 30 do not have recurrence it doesn't matter what the values the rules which look at location, no of LNs etc assign the certainty that will always be 1 however the other rules are affecting this value. Not easy to explain.

Will read through Hopgood and the example in block one of T396. I think I will need less rules with or staements in them.

TMA really needs to get moving tomorrow. Not sure how to tackle it. Too late to contact tutor now. Will do the best I can in the time I have.

Must state that all I want is to pass this project. After 7 years of OU I am utterly fed up with studying.

Football season kicks of tomorrow. The only time I will be away from this PC will be to watch the Hammers.

Wednesday, 8 August 2007

Wednesday 8th August

TMA deadline looming. Need to get it sorted.

Spent some time modifying my flex program using certainty factors. It is almost working. Hopefully a little more time will sort it out. I will see if I can add to the 6 rules I have. If not I will test when it is working. Need to sort out testing and scoring strategy. Probably use a similar score tool as the NN but with less instances to test. Upto 50 patients would be enough.

Then I'll modify the NN so that it has the same data inputs as the flex program and test. That way I'll be testing like with like.

Probably resume this work now after the TMA.

Sunday, 5 August 2007

Later Sunday

Too hot to be really productive. However question 1 of TMA03 is OK. Not entirely sure about question 2.

Decided just to run the flex program with uncertainty rules only as I am sure I can get it to work, though it is cumbersome to test.

Noticed in flex manual that it states:

Given a rule:
rule1: if A & B then C
there are 3 potential areas for uncertainty.
- Uncertainty in data (how true are A and B)
- Uncertainty in the rule (how often does A and B imply C)
- Impreciseness in general
The first 2 can be handled using probabilities and the third using fuzzy logic.

As the main sources of uncertainty in this project is in the data - difficult to measure tumours accurately etc, and the rules, there are only four rules that are always true according to the statistical analysis of this dataset but these are probably not always true with other datasets. Therefore I was justified in using uncertainty rules to deal with these sources of uncertainty.

Sunday 5th August

Back from holiday. I did think I might be able to work on the laptop while away but never did. So am going to work some more on the project draft for TMA03. I have decided that I must compare like with like. In the project I did for T396 there were many more inputs and bird instances used in the neural network than the KBS. Therefore it was hardly surprising it out performed the KBS. After I have a satisfactory NN and KBS working I will either scale up the KBS to use the same number of patient instances as the NN and data inputs, or more likely I will scale down the NN to use the same number of patient instances as the KBS etc.

Back to work.

Saturday, 21 July 2007

Saturday 21st July

Forgot to note that I have been working on a flex program for predicting breast cancer recurrence. I am trying to combine an earlier working flex program which had rules which were always true with rules that are not always true. I am using certainty factors to calculate the probability that the cancer will recurr.

I decided to use certainty factors mainly because I feel I can understand them and also because after reading Hopgood discovering they are for uncertain data and rules, not uncertain language.

Anyway off on holiday now. Will have to resume when I get back. I am feeling much more confident that I can get a reasonable flex program working.

Also started a draft of the project which is required for TMA03

Sunday, 15 July 2007

Sunday afternoon

Tried to work out how the dialog editor works but there is no mention of it in the help files. It would have been good to enter the patient data using a custom form but I haven't the time to waste trying to sort this out.

Had a read through a tutorial with flex which explained that putting the numbers in between ' would treat them as atoms not as numbers. I have adjusted the code and the program now runs fine.

group no_involved_nodes '0-2', '3-5', '6-8', '18+'.

question involved_nodes
how many involved_nodes are there ? ;
choose from no_involved_nodes.

Sunday 15th July

I thought I'd posted something during the week. I have been working through a series of experiments with a multi-layer perceptron. I have been looking at the number of hidden layer neurons, the number of training cycles and the score tool threshold. I have noticed that the networks performance never gives a score of more than 48% no matter how many training cycles are defined.

I have switched my attention to the knowledge based system. I have been having trouble with Flex because I was using questions with the answers in groups but the answers were numbers and I couldn't get flex to treat them as just text for the purposes of these questions. I have therefore developed questions in a different way. I will test this shortly.

Monday, 9 July 2007

Monday 09/07/2007

After a fustrating day yesterday because Window Vista doesn't seem to let neuralworks write the results files out, I reverted to using Windows XP on a different PC. I tested a 32 Input layer, 15 hidden layer and 2 output layer MLP with the full 285 patients after training the network. I then adjusted the score tool threshold to find an optimum but this wasn't as simple as I thought. As the threshold was raised generally the network performed more poorly and this was reflected in the statistical analysis and the overall score. However when the threshold reached 0.9 though by most measures of performance the network performed more poorly the positive predictive value of the network improved. I checked the calculation of this value but it looks correct so I can only assume that it is probably a statistical anomaly. I may test the score tool threshold further with different network configurations but

Sunday, 1 July 2007

Sunday 1st July

Picked up last TMA result. I did about as well as I expected. Have been experimenting with my neural network. Have made up a large spreadsheet to log the results. I have enhanced the score tool to calculate the false positive and false negative rate. I hope to modify it further to include the positive predictive value and the sensitivity as these are well defined measures of the performance of the neural network and allow comparison with other methods of classifying data such as that used here. I've run a few experiments and the network is working though at this stage not very well.

Three weeks till hols so not much time.

Tuesday, 5 June 2007

Tuesday 5th June

Another TMA day. The word count is very restrictive. I've tried but I can't get it near the 800 words per question. Almost done. Will submit tomorrow.

Monday, 4 June 2007

Monday 4th June

Answered question 1 a of TMA o2. Realised in answering 1 b that my literature search was rather crap but I'll answer as best I can and continue to search for more information as the project progresses. For all the searching I have done on IEEE Xplore etc the papers I have found are rather high powered and quickly lose me when they delve into mathematical formulae. I have found general sites like Wikipedia far more interesting and useful.

Sunday, 3 June 2007

Sunday Evening

Ran a very rudimentary flex program with one rule but at least it worked and it's a start. I loathe using flex but I've got to get to grips with it. I've started from the boiler example in T396 and it works so I'll just work through the examples I've got gradually till I get to an object based program using certainty factors. I may yet try another way of handling uncertainty in the data.

Now back to the TMA.

Sunday 3rd June

Have modified an old flex progam from T396 to test the 3 rules that are correct all the time just to ensure this works OK (rbs1.ksl). The presentation of the questions will have to change and I will need to add the further rules with certainty factors for true comparison with the neural networks.

Started TMA write up yesterday. Should be OK.

Saved the training file as an MS-DOS text file from excel and this trains the MLP without error. However the test data saved in the same format does run but the output file does not contain data that can be interpreted. I will have to spend some time looking at this.

Wednesday, 30 May 2007

Wednesday 30th May

I went through the PROMPT criteria for evaluating information from the literature search in preparation for the TMA. Have booked 2 days off to get it done. Have been trawling around for information in journal articles to include in the project. The MIT course lecture notes for Medical Artificial Intelligence are heavy going but there is some relevant information about neural networks.

Tuesday, 29 May 2007

Tuesday 29th May

Have now finished preprocessing the data for the neural network and created a training file of 50 random patients. Have also created the test file of all 285 cases for the neural network. I created a multi-layered perceptron with 32 input neurons, 10 hidden layer neurons and 2 output neurons and tried to train this simple network. However neural works errors with the training file. This is a text file with a .nnb file extension. I will make sure that this file is the correct format for neural work and recheck the number of neurons.

Sunday, 27 May 2007

Sunday 27th May

Have been spending a lot of time preprocessing the breast cancer data for the neural network. Until this is complete I cannot start testing. I have taken my tutors advice and tried to reduce the number of input neurons, though I have still probably got too many. The statistical analysis of the data suggested that some attributes have no influence on tumour recurrance. The location of the tumour in the breast which I took a long time converting into quadrants (and the centre) seems to have no influence on recurrance, because there is a roughly 70% / 30% split between non-recurrence and recurrance which is the same as looking at the overall ratio of non-recurrence to recurrence. Also some attributes are related. For example the menopausal status is age related so I have removed menopausal status.

I will now carry on with this preprocessing which is rather laborious because I am doing it manually. I do not have the expertise to covert the raw data into the format required by the neural network any other way. There are likely to be mistakes in this conversion but I am trying to be as meticulous as I can.

Saturday, 19 May 2007

Saturday 19th May

May have been a bit hasty declaring I had preprocessed all the data. I have been looking at the statistical analysis of the data today and have identified an number of rules. After having read through the T396 project I have decided to handle uncertainty using certainty factors as I seem to ba able to understand this more than fuzzy logic and baysian updating. After the statistical analysis I have decided that I will not use all the possible attributes as input neurons for the neural network.

For example there seems to be no significance between the location of the tumour in the breast and the chances of reccurence of the tumour, apart from a much lower incidence of reccurence if the tumour is located centrally. Therefore I will cut this down from 5 input neurons (one for each quadrant) to 2 input neurons (central or not). There are several other possible situations where the number of input neurons could be reduced.

I hope to very shortly complete the rules identification. I have four rules that are always true and the rest are mostly true so will need to be used with the certainty factors.

As an aside and worth remembering when discussing this project is that there is no indication of how long was it from when the tumour was removed to the assessment of whether there was reccurence or not, i.e. if the patients without reccurence were followed up a few years later would the tumour still not have reoccured.

Wednesday, 16 May 2007

16th May

Did some statistical analysis on the breast data in a similar way to the T396 project I did in 2005. I am trying to look for some rules in the data but they are not obvious. Looked at the examples on the course for using certainty factors in flex. Still not sure which way to do this, uncertainty factors, fuzzy logic or baysian updating.

Am spending so much time configuring the order comms system at work that I am less than keen to spend time using a computer when I get home. Still the weather's crap which helps.

Tuesday, 15 May 2007

15th May

Football season over. West Ham safe, now to pull my finger out again. Finished preprocessing the data for the neural network.

Wednesday, 25 April 2007

Wednesday 25th April

Started preprocessing data by converting two data fields (side and sector) into one (quadrant). Bit of a pain updating 285 records. Still it's a start.

Saturday, 21 April 2007

Saturday 21/04/07

Spent some time analysing the data for trends to develop rules. There are some that stand out e.g. the higher the histological grade the greater the chances of recurrence. This is out all I've done lately but I'm not worried as I there's a some room in the schedule at the moment.

It's been a really good day as I recieved my TMA result and was shocked at how good it was. I would have been happy with 50%. Anyway I mustn't let it go to my head. West Ham won too so great.

Tuesday, 3 April 2007

Tuesday 3rd April

Completing TMA01. Have drawn up a schedule for the project because I had to for question 1. I should have done this earlier that way I may not have spent so much time chosing the dataset and may have made more progress in deciding what I would actually do. Hence I have been rather vague about the KBS because I should have spent more time on this.

Added link for background info about breast cancer prognosis

Sunday, 1 April 2007

Week beginning 25th March

Spent time at work extracting breast cancer data from WinPath. I spoke to staff from the breast care team to see if I can find out about patient outcomes from those who had histology results. I have a couple of contacts to follow up.

I have decided to try to use the data from work if I can get the clinical outcomes because I always have the breast cancer data that I have down loaded as a backup if this isn't achieveable. I will have to be careful not to use patient identifiable data because of the Data Protection Act. Therefore I won't be able to use a date of birth field but age will be OK.

Downloaded several papers on breast cancer prognosis and artificial intelligence via IEEE Xplore. I am reading through these but generally they have used neural networks with some success and fuzzy logic in hybrid systems. I would like to use neural networks in a similar way although, providing I can use the data from work, I will use different prognostic indicators such as oestrogen and progesterone receptor status and her2 status which were not included in these papers. I will try training a multilayer perceptron and comparing this with a radial basis function network and/or Kohonen self organising network to look at supervised vs unsupervised learning. I will try using an one of these network that use unsupervised learning to cluster the data coupled to a MLP to classify the data clusters.

I will try using a rule based system to compare it's performance with that of the neural networks. I will try using fuzzy logic to produce rules based on analysing the data I have. For example: If number of lymph nodes is high and the tumour is large then the chances of reoccurence if high.

Now to sort out that TMA.

Monday, 26 March 2007

Monday 26/03/07

Decided to search for examples of AI in cancer prognosis for the literature search part of TMA01. I have started with a simple google search for the terms: AI cancer prognosis.

I immediately read a useful article on the New Scientist website which describes using a neural network and fuzzy logic to predict the disease spread and 5 year survival rates for patients presenting with various types and stages of breast cancer. Link added "Artificial Intelligence Tackles Breast Cancer"

I followed the link in the New Scientists site to the Biomedical Computing and Engineering Technologies Applied Research Group who produced the original paper. Here there is a list of very useful publications and journal articles. I will use the OUs library to see if I can access the full text of some of these articles.

References on site that may be useful:

NAGUIB, R.N.G. and SHERBET, G.V. (2001) Artificial neural networks in cancer diagnosis, CRC Press. Prognosis and Patient Management ISBN: 8493-9692

Seker H, Odetayo MO, Petrovic D and Naguib RNG. A Fuzzy Logic Based Method for Prognostic Decision Making in Breast and Prostate Cancers. IEEE Trans. Inform. Tech. Biomed., 2003, 7 (2): 114-122.

Seker H, Odetayo M, Petrovic D, Naguib RNG, Bartoli C, Alasio L, Lakshmi MS and Sherbet GV. Neuro-Fuzzy Rule-Based Intelligent Survival Analysis of Breast Cancer Patients Using Histological and Image Cytometric Prognostic Factors. Proc. Am. Assoc. for Cancer Res., Washington, DC, USA, 2003, 44: 4513.

Also discovered this article with is very interesting and provides background on who prognosis is currently derived.

Matt Williams and Jon Williams Combining Argumentation and Bayesian Nets for
Breast Cancer Prognosis (Draft) May 2005

Sunday 25/03/07

Just over a week to go till the TMA has to be submitted. I read through the TMA again and realised I still have a lot to do and really haven't planned as well as I should. I have downloaded the breast cancer data from the Machine Learning website and put it into MS Access. I have written some queries to look at each of the attributes in the table to see how they correlate with classification of tumour recurrance or no recurrance. No obvious patterns seem to emerge from the data. I therefore think that which ever AI techniques I use they must be able to handle uncertainty. It won't be possible to say patient x's tumour will reoccur but it will be possible to give the proabaility that it will reoccur.

Saturday, 24 March 2007

Saturday 24th March

Recieved replacement CD yesterday but was sent the wrong one - Health & Social Sciences. I will have to ask again - more time wasted.

Used google to search for "Data Classification" and "Survey Classification". I added a link to a paper on classification. I also searched for "Data Mining" which took me to Wikipedia again. Added that link too. I've ruled out my plan to use the RC Path workload idea because I will need to rely on the pathologists to score the requests. I don't think I can trust them to do it.

I am going to pop into WH Smiths to see what data is in the back of the Hi-Fi mags etc. The Top Gear magazine has lots of data about cars in the back. I have a few ideas using that.

Sunday, 18 March 2007

Sunday PM

Really not sure about this project propsal so far. I am going to download the breast cancer data from the machine learning repository and have a look at this.

Added some more links. Must do some more reading and find the CD-ROM.

Sunday 17/03/07

I read through the Royal College of Pathologists Workload Guidelines again trying to pick out data fields that contribute to the workload and that are already captured on the lab system. I decided that there are 7 data items currently held in the lab system WinPath that the RCPath Workload Guidelines state are factors contributing to the overall workload of the pathologist. These are:

  1. Specimen Type e.g. Breast Biopsy, Skin Biopsy etc. Approx 250 different types.
  2. Levels - Are stains at various levels through the tissue block required. The more slides cut the longer it takes to examine them all.
  3. Number of Special Stains Requested
  4. Number of Immunocytochemistry Slides
  5. 2nd Pathologists Opinion Required - Usually suspected cancer cases
  6. Minimum Dataset Completed
  7. Presentation at Multi Disciplinary Team Meeting
There are other data items recorded on the lab system which, although not specifically mentioned in the RCPath guidelines, do contribute to the overall complexity of the case and therefore the workload of the pathologist. These are:

  1. Number of Specimens - Each case may consist of more that one specimen. The greater the number of specimens the greater the workload. The workload could be measured per specimen but the RCPath Guidelines suggest a workload per case with each specimen contributing to the overall score. I have chosen just to use the most major specimen type for each case as this has the greatest contribution to the overall work required to report the case. I will add some new specimen types to help deal with cases where we get multiple pots with a low complexity but because of their number this makes them more time consuming.
  2. Number of blocks - Generally the number of blocks represents the amount of sampling the specimen(s) require. Complex cases usually require greater numbers of blocks to be taken. Each block produces a minimum of one slide for the pathologist to examine.
  3. Extra H&E slides - A pathologist may request further levels on a block if the case is difficult to report.
There one data item mentioned in the guidelines that are not captured reliably by the lab system - usually due to user error - so I have developed a proforma to print on the back of the lab request forms to capture this acurately.

  • Pathologist/BMS Cut Up - Simple cases are often described and sampled by BMS staff and are only microscopically examined by the pathologist.
Finally the workload for each case is classified on a scale of 1 to 10 with 1 being the least workload and 10 the highest. This is currently not assigned.

I aim to get the pathologists to assign the workload to their cases following the RC Path guidelines. Then extract the all the data from the lab system and add the supplementary data field to each case (Pathologist/BMS Cut Up?) Process the data to a form acceptable to a NN. Train the network with the data including the workweight assigned by the pathologist, then test with data that doesn't include the assigned workweight and compare this to the workweight assigned by the pathologists. The lab number can be used to compare the workweight from the NN to those from the pathologist. I would use a NN with 11 input neurons for the 11 data items and 10 output neurons for each workweight.

Problems:

  • RCPath Guidelines, for want of a better description, are very woolly. I think a NN due to their black box nature can cope with the lack of clarity in the guidelines. However a KBS needs rules to be able to classify each case, and rules are difficult to create from this data. Handling uncertainty - fuzzy logic etc - is one way to cope with this but I think writing the rules in the first place would be difficult. I would probably have to analyse all the data and assign average scores to each data item - possibly, maybe. I think a KBS might be useful to preprocess the specimen type data before passing it to the NN.
  • Bias - I think the Pathologist will over score the workload. They are not going to admit to being anything other than overworked
  • Other factors that add to the pathologsits workload that aren't measured - e.g. Training of Junior Staff.

Last Week

Not much progess. Work commitments are difficult to manage especially now that the Order Communications Project seems to have become important again.

I've read through the study guide again. I can't find the CD ROM it refers to. It must be here somewhere. There's too much junk by the PC for things to get lost in.

TMA01 deadline looming so I've booked time off work before this to help. It's very difficult to concentrate when the kids are bickering.

Monday, 5 March 2007

Monday Lunchtime

I've been searching google using "database repository" as the search criteria and came across the BioMed Central Databases site. There are many databases of diseases listed. I looked at a couple of cancer ones but access to these was restricted. I will have a further look at this when I get home. I've added the link to the main site.

Sunday, 4 March 2007

Late Sunday 4th March

After having looked at the two database repositories suggested in the study guide I was pleased to see some databases containing medical data. I find this sort of thing interesting. I have enough data at work to use but I lack the clinical outcomes needed which would be the classes that all the other attributes use to calculate. I will look at our Royal College of Pathologists Minimum Dataset data at work to see if there is any milage in that but if none of my ideas work I think the Breast Cancer data in the UCI repository of machine learning databases would offer me a flaaback which my tutor says I must have in case he feels my project is not suitable

More Thoughts

So my biggest challenge on this project is going to be planning. I just don't work in an organised way. I'm sure I would probably get better results but I just don't stick to plans so I don't make them. I realise I am going to have to plan this properly.

I do have experience of doing a large project from my Fellowship of the Institute of Biomedical Science. I am pretty good a writing well and producing reports. Certainly as far as the OU goes what my reports lack is substance! My knowledge of computing isn't the best, so I'll have to try harder.

Having read the study guide I realise that doing a project solely based around neural networks isn't enough. I am considering using a knowledge based system to in effect preprocess some of the data required into a form that a neural network can use, so in effect a hybrid system.

I have downloaded the Royal College of Pathologists publication Guidelines on Staffing and Workload for Histopathology and Cytopathology Departments (2nd Edition) 2005. I will read through this to see how practicle it is to calculate workload for histology specimens from the data in our labs computer system. On first glance it seems somewhat ambitious. I plan to look at the suggested online databases from the study guide.

I must email my tutor to tell him of the existance of the blog so that he can comment if he wants to.

A hectic weekend and seeing West Ham lose in injury time has not put me in the best of moods.

Monday, 26 February 2007

Monday 26/02/07

Had a look around the OU site. I downloaded a presentation on the T4xx projects and got a bit of a fright. There was mention of CD-ROMs accessing library resources etc. I've got to get my finger out! Need to read through the blurb the OU sent, and make sure I fully understand what's expected of me. Must check whether I've got a CD-ROM and read the study guide to work out what I have to do for TMA01.

Sunday 25/02/2007

More thoughts about the project. I think a rule based system may be the way to take the raw data and present this in a usable form to the NN. Still not sure about what to do the project on. Perhaps using AI to examine pathologist workload statistics. To classify different histology specimen types into the 10 different work weight categories depending on the number of blocks taken, the routine, special stains and immunocytochemistry requests on each case etc. I would like to do something in neuropathology. I like the idea of using a artficial neural network to analyse a problem in a real neural network. I need to have a chat with some of the pathologists this week but our Clinical Pathology Accreditation Inspection is on Tuesday and Wednesday so it won't be easy to find the time.

Opening Thoughts

The aim of this log is to show how I have developed this project from the beginning to the final document. I have an aversion to planning work properly so I will have to remain determined to write my thoughts in this log regularly. Up till now I have done some reading around AI techniques particularly Neural Networks. My experience of T396 was that I found Flex, the rule based system difficult to use but was much happier using the Neural Works. Therefore I really want to base this project around Neural Networks (NNs). So I want this project to look at the feasibility of using NNs in a particular situation rather than the more obvious project of comparing Rule Based Systems with NNs in a particular situation. That is my aim however I may find I need to consider Rule Based Systems as the project is implemented.

I have given a great deal of thought to what situation I will apply the NNs to. I have looked at one example project on the OU’s web site and read an interesting article about using a Kohonen Network to sort photos according to what is actually in the photo, rather like Adobe Photoshop. However I would really like to use data from work. The problem here though is getting hold of clinical outcomes of patients due to patient confidentiality. I would like to get results from various tests from Pathology, Radiography etc and see if an AI system can predict the clinical outcome for the patient. Cancer treatment seems to be a good area. If I can find out details of the disease that was diagnosed, the histology and immunocytochemistry results, patients age, sex, treatment eg chemo/radiotherapy and the five year survival rates for the disease it may be possible to predict the likely outcome for undiagnosed patient. That said this may be too obvious e.g. if the tumour has shows metastasis the patient will probably die, if there is no metastasis they will probablyl survive. I may have to look at a specific grade of disease. This is rather like the oiled seabird project in the ETMA for T326. An AI system like this if it worked, would be able to predict which patients are likely to survive and so who to utilise valuable resources trying to save. In practice treatment is never as cut and dried as this. The more I think about this the less convinced I am that it would be any good.


About Me

My goal in life is to become grumpier. There's no point getting older unless you become grumpier. Working for the NHS helps as does supporting West Ham, so one day I'll end up like Victor Meldrew.