Datasets used to train AI to detect skin cancer lack information on darker skin and often incomplete

AI is increasingly being used in medicine to diagnose diseases like skin cancer quicker and more effectively. However, to work, AI needs to be ‘trained’ by looking at large data sets of images from patients with a positive cancer diagnosis, so an AI program depends heavily upon the information it is trained on.

Research presented by Dr David Wen from the Oxford University Hospitals NHS Foundation Trust at the NCRI Festival today has found that there is an urgent need for improved diversity and quality in skin cancer and other skin lesion data sets which contain information on who is represented in the datasets.

AI programs hold a lot of potential for diagnosing skin cancer because it can look at pictures and quickly and cost-effectively evaluate any worrying spots on the skin. However, it’s important to know about the images and patients used to develop programs, as these influence which groups of people the programs will be most effective for in real-life settings. Research has shown that programs trained on images taken from people with lighter skin types only might not be as accurate for people with darker skin, and vice versa.
- Dr David Wen

Dr Wen and his colleagues carried out the first ever review of all freely accessible sets of data on skin lesions around the world. They found 21 sets which collectively contained more than 100,000 pictures.

Diagnosis of skin cancer normally requires a photo of the worrying lesion as well as a picture taken with a special hand-held magnifier, called a dermatoscope, but only 2 out of 21 datasets included images taken with both of these methods. The datasets were also missing other important information, such as how images were chosen to be included, and evidence of ethical approval or patient consent.

Fourteen of 21 datasets gave information on which country they came from and of those, nine contained images from European countries. Only a small percentage of images were accompanied by information about the patients’ skin colour or ethnicity. Among pictures where skin colour was stated (2,436 pictures), only ten were of brown skin and only one was of dark brown or black skin. Among pictures where ethnicity was stated (1,585 pictures), none were from people with African, Afro-Caribbean or South Asian background.

We found that for the majority of datasets, lots of important information about the images and patients in these datasets wasn’t reported. There was limited information on who, how and why the images were taken. This has implications for the programs developed from these images, due to uncertainty around how they may perform in different groups of people, especially in those who aren’t well represented in datasets, such as those with darker skin. This can potentially lead to the exclusion or even harm of these groups from AI technologies.

Although skin cancer is rarer in people with darker skins, there is evidence that those who do develop it may have worse disease or be more likely to die of the disease. One factor contributing to this could be the result of skin cancer being diagnosed too late.
- Dr David Wen

To guard against this, Dr Wen and his colleagues hope to create quality standards for health data used in AI development. This will include information on who should be represented in datasets and which patient characteristics should be recorded.

This information was presented today at the NCRI Festival where Oxford cancer researchers are speaking on their work.

Datasets used to train AI to detect skin cancer lack information on darker skin and often incomplete

WE WANT TO HEAR YOUR NEWS

Similar Stories

Oxford Cancer Research Leadership Highlighted at CRUK’s Data-Driven Cancer Research Conference

International Day of Women and Girls in Science: In conversation with Jess Caterson

Oxford-built multi-agent assistant for cancer care to be piloted in collaboration with Microsoft

Cookies on this website

Datasets used to train AI to detect skin cancer lack information on darker skin and often incomplete

WE WANT TO HEAR YOUR NEWS

Similar Stories

Oxford Cancer Research Leadership Highlighted at CRUK’s Data-Driven Cancer Research Conference

International Day of Women and Girls in Science: In conversation with Jess Caterson

Oxford-built multi-agent assistant for cancer care to be piloted in collaboration with Microsoft