Mole or cancer? The algorithm that gets one in three melanomas wrong and erases patients with dark skin
The Basque Country is implementing Quantus Skin in its health clinics after an investment of 1.6 million euros. Specialists criticise the artificial intelligence developed by the Asisa subsidiary due to its "poor” and “dangerous" results. The algorithm has been trained only with data from white patients.

Time is money. Especially for melanoma, the most dangerous skin cancer: diagnosing this tumour as soon as possible is more decisive in saving lives than in any other cancer. In Spain, it is estimated that in 2025 there will be around 9,400 cases of melanoma, a very aggressive cancer which can spread rapidly and metastasise in just a few months. When this happens, the prognosis is usually poor, so detection errors can be fatal.
It is precisely this urgency that has led the Basque Country to commit to artificial intelligence. In 2025, the Basque health system, Osakidetza, is working on incorporating Quantus Skin, an algorithm designed to diagnose the risk of skin cancer, including melanoma, at its primary health clinics and hospitals. In theory, it promises to speed up the process: general practitioners will be able to send images of suspicious lesions to the hospital’s dermatology service, along with the algorithm’s estimate of their being malignant. The Basque government’s idea is that Quantus Skin, which is currently being tested, will help prioritise patients for treatment.
A 1.6 million euro public contract for Asisa
In 2022, the Basque Health Service, Osakidetza, awarded a €1.6 million contract to Transmural Biotech to implement “artificial intelligence algorithms in medical imaging,” which required achieving a sensitivity and specificity of “at least 85%.” The company, which was created as a spin-off from the University of Barcelona and Hospital Clínic, belongs to the private insurance company Asisa. Although the specifications included several types of cancer and other diseases, Osakidetza only selected two algorithms, including Quantus Skin, due to its “greater healthcare impact” and to “obtain a higher return on health.” The decision, moreover, was taken unilaterally, without consulting specialists, Civio has learned. In February, Osakiteza stated that Quantus Skin had passed the “validation phase” and was in the “integration phase.” In a more recent response to queries from Civio, the service now says that it continues testing the algorithm and will take decisions “accounting for the results we obtain.” However, the service did not address the fact that the published clinical results of Quantus Skin (69.1% sensitivity and 80.2% specificity) are below the 85% threshold set by the public contract. Apart from the award in the Basque Country, Transmural Biotech has only one other public contract, in Catalonia, for a much smaller amount (25,000 euros) to certify artificial intelligence algorithms in radiology.
However, the data are troubling. Transmural Biotech, the company that markets Quantus Skin, conducted an initial study with promising results, but it had significant limitations: it was conducted entirely online and was not published in any academic journal, meaning that it did not undergo the usual quality control required in science.
Later, dermatologists from Ramón y Cajal Hospital and professors from Complutense University in Madrid conducted a second study, which was published, to evaluate the actual clinical efficacy of Quantus Skin. This study, which received funding and technical assistance from Transmural Biotech, showed worse results: the algorithm misses one in three melanomas. Its sensitivity is 69.1%, which means that it misses 31% of actual cases of this potentially lethal cancer.
When Civio contacted the CEO of Transmural Biotech, David Fernández Rodríguez, he responded evasively by email: “I don’t know what it is right now.” After insisting on the phone, he changed his story: “What we were doing was testing,” to detect possible implementation problems. And, at the end of the call, Fernández Rodríguez acknowledged that Quantus Skin “didn’t stop working, it just worked much worse, but we had to figure out why.”
Fernández Rodríguez attributes these poorer results to deficiencies in image capture due to not following Quantus Skin’s instructions. This is something they have also seen in trials in the Basque Country: “Primary care doctors are not well trained to take the images,” he says, which implies a need to “train the doctors.” However, the second study involved dermatologists who specialise precisely in photographing suspicious lesions for subsequent diagnosis. According to Fernández Rodríguez, reliability improved after “cropping the images properly” because “they were not following the instructions exactly.”
Independent sources criticise the diagnostic tool
“For skin cancer, having sensitivities of 70% is very bad. It’s very poor. If you give this to someone to take a photo and tell you if it could be a melanoma and they are wrong in one out of three, it is inappropriate for skin cancer screening in a primary setting. You have to demand more,” explains Dr Josep Malvehy Guilera, director of the Skin Cancer Unit at the Hospital Clínic in Barcelona. “A 31% false negative rate sounds dangerous to say the least,” says Dr Rosa Taberner Ferrer, dermatologist at the Son Llàtzer Hospital in Mallorca and author of Dermapixel: “As a screening test it’s crap.”
However, Fernández Rodríguez attempts to downplay the problem by focusing only on data that favours his product, avoiding mention of Quantus Skin’s low sensitivity. Quantus Skin fails in two ways: according to the same study, its specificity implies a 19.8% false positive rate, i.e. it mistakes one in five benign moles for melanoma. This could lead to an unnecessary referral of around 20% of screened patients.
In the study, the authors – dermatologists at the Ramón y Cajal Hospital in Madrid and optics professors at the Complutense University of Madrid – argue that it is preferable for Quantus Skin to have a higher specificity (fewer false positives) even at the cost of a lower sensitivity (more false negatives) because it is not used for definitive diagnoses. It is just a screening tool, to refer cases from primary care. They hypothesise that this can prevent specialist consultations from overcrowding, reducing waiting times and lowering medical costs.
“If it misdiagnoses melanoma in lesions with a potential risk of growing rapidly and even causing the death of the patient, then I must be very intolerant. I must demand sensitivities of at least 92%, 93%, 94%.”
Specialists consulted by Civio disagree. Although there is no ideal standard for cancer diagnosis – partly because it depends on the aggressiveness of each tumour – what Quantus Skin has achieved is far from acceptable. “If it misdiagnoses melanoma in lesions with a potential risk of growing rapidly and potentially even causing the patient’s death, then I must be very intolerant. I expect sensitivities of at least 92%, 93%, 94%,” says dermatologist Malvehy Guilera of the Hospital Clínic in Barcelona.
“If they intend to use it for screening, then the system should have a super high sensitivity at the expense of a slightly lower specificity,” explains Taberner Ferrer. In other words, it is preferable for an algorithm like this to be overly cautious: better to err a little by generating false alarms in healthy people than to miss a real case of cancer.
Dark skin, uncertain diagnosis
The problems with Quantus Skin go beyond its low sensitivity. The paper only evaluated its efficacy in diagnosing melanoma, but did not look at other more common but less aggressive skin cancers, such as basal cell carcinoma and squamous cell carcinoma, where Quantus Skin can also be applied. The authors also did not study how skin colour affects the performance of the algorithm, although they acknowledge that this is one of the main limitations of their research.
The diversity that Quantus Skin neglects
At the beginning of 2025, the Basque Country had 316,942 people of foreign origin, according to data from the Ikuspegi Basque Immigration Observatory. More than 60,000 came from the Maghreb and sub-Saharan Africa, while nearly 164,000 people came from Latin America, where there is a great variability of skin tones. That is not counting people born in Spain with foreign ancestry who reside in the Basque Country, such as the well-known footballers Iñaki and Nico Williams.
Quantus Skin, based on neural networks, has learned to recognise skin cancer almost exclusively in white people. The algorithm was first fed with just over 56,000 images from the International Skin Imaging Collaboration (ISIC), a public repository of medical photographs collected mainly by Western hospitals, where the majority of patients have fair skin. Subsequently, Quantus Skin was re-trained using images of 513 patients from the Ramón y Cajal Hospital in Madrid, all of whom were white.
The data set used to train Quantus Skin includes images “of Caucasian males and females,” Fernández Rodríguez says. “I don’t want to get into the issue of ethnic minorities and all that, because the tool is used by the Basque Country, by Osakidetza. What I am making available is a tool, with its limitations,” Fernández Rodríguez says. Despite the lack of training on darker skins, the Basque government says it is not necessary to “implement” any measure “to promote equality and non-discrimination,” as stated in the Quantus Skin file in the Basque Country’s catalogue of algorithms and artificial intelligence systems. However, as the neural networks have been trained almost exclusively on images of white people, they are likely to fail even more on darker skins, such as individuals of Roma ethnicity or migrants from Latin America and Africa.
“Algorithms are so easily fooled.”
“Algorithms are so easily fooled,” says Adewole Adamson, professor of dermatology at the University of Texas, who warned in 2018 of the discrimination that artificial intelligence could lead to if it was not developed in an inclusive and diverse way.
His predictions have been confirmed. In dermatology, when algorithms are fed mainly with images of white patients, “diagnostic reliability in dark skin decreases,” says Taberner Ferrer. The precision of the Skin Image Search algorithm, from the Swedish company First Derm, trained mainly on photos of white skin, dropped from 70% to 17% when tested on people with dark skin. More recent research has confirmed that such algorithms perform worse on black people, which is not due to technical problems, but to a lack of diversity in the training data.
Although melanoma is a much more common cancer in white people, people with dark skin have a significantly lower overall survival rate. American engineer Avery Smith knows these figures well. His wife, Latoya Smith, was diagnosed with melanoma only a year and a half after getting married. “I was so surprised by the survival rates listed for people with the same diagnosis as my wife, and by how they were dependent on race. My wife and I are both Black American and we were at the bottom of the survival rate. I didn’t know until it hit me like a bus. That’s scary as hell,” he tells Civio. Some time after the diagnosis, in late 2011, Latoya died.
Since then, Smith has been working to make dermatology more inclusive and to ensure that algorithms do not amplify inequalities. To remind us of the impact they can have, especially on vulnerable groups, Smith rejects talking about artificial intelligence as a “tool,” as if it were a simple scissors: “It’s a marketing term. It’s a way to get people to grasp it who aren’t technologists. But it’s far more than just a tool.”
Legal expert Anabel K. Arias, spokesperson for the Federation of Consumers and Users (CECU), also speaks of these effects: “When thinking about using it to make an early diagnosis, there may be a portion of the population that is under-represented, and in that case, it may be wrong and have an impact on the health of the person. You can even call it harm.”
Invisible patients in the eyes of an algorithm
“People tend to put too much trust in artificial intelligence, we attribute to it qualities of objectivity that are not real,” says Helena Matute Greño, professor of experimental psychology at the University of Deusto. Any AI uses the information it receives to make decisions. If that input data is bad or is incomplete, it is likely to fail. When it is systematically wrong, the algorithm makes mistakes that we call biases. If they affect a certain group of people more than others – because of their origin, skin colour, gender or age – we call them discriminatory biases.
A review published in the Journal of Clinical Epidemiology showed that only 12% of studies on AI in medicine looked for bias. And when they did, the most frequent bias was racial bias, followed by gender and age, with the vast majority affecting groups that had historically suffered discrimination. These errors can occur if the training data are not sufficiently diverse and balanced: if algorithms learn from only part of the population, they perform worse on different or minority groups.
Errors are not limited to skin colour. Commercial facial recognition technologies fail more often at classifying black women because they have historically been trained on images of white men. Similarly, algorithms that analyse chest X-rays or predict cardiovascular disease perform worse in women if the training data is unbalanced. Meanwhile, one of the most widely used datasets for predicting liver disease is so biased – 75% of the training set is male – that algorithms using it fail much worse in women. In the UK, the algorithm for prioritising organ transplants discriminated against younger people. The reason? It had been trained with limited data, which only took into account survival over the next five years, and not the potentially much longer life that patients receiving a new organ might gain.
“The data used for training must represent the entire population in which it will be used,” explains Nuria Ribelles Entrena, spokesperson for the Spanish Society of Medical Oncology (SEOM) and oncologist at the Virgen de la Victoria University Hospital in Malaga: “If I only train it with a certain group of patients, it will be very effective in that group, but not in others.”
Age bias is especially problematic in paediatrics. “Children are not little adults. They have completely different physiology and pathological processes,” warn the authors of a journal article. Since children do not normally participate in clinical research, the situation is “a drama,” according to Antonio López Rueda, spokesperson for the Spanish Society of Medical Radiology (SERAM) and radiologist at Bellvitge University Hospital in Barcelona.
Ignasi Barber Martínez de la Torre, spokesperson for the Spanish Society of Paediatric Radiology (SERPE) and head of paediatric radiology at Sant Joan de Déu Hospital, illustrates this with a personal experience. His team tried to validate a chest X-ray model trained on adults in the paediatric population. “We soon realised that it made many more errors. The sensitivity and specificity were totally different,” he says. One of the errors was identifying the thymus, a very large gland in young children that disappears in adulthood, as a suspect. The same goes for the skeleton, which in young children has “unossified parts” that can be mistaken for fractures.
Navigating the bias obstacle course
The solution to avoid bias exists: “The training set has to be as large as possible,” explains López Rueda. But the data are not always available for independent analysis. So far, most artificial intelligence systems implemented in Spain that use medical images do not usually publish the training data. This is the case with two dermatology systems – whose names are not even public – that will first be tested in the Caudal health area and then extended to the whole Principality of Asturias. The commercial application ClinicGram, for detecting diabetic foot ulcers, in use at the University Hospital of Vic near Barcelona; and several private radiology systems, such as Gleamer BoneView and ChestView and Lunit, which are operating in the Community of Madrid, the Principality of Asturias and the Community of Valencia also fail to publish their training data.
Where training datasets are accessible, another obstacle is that they do not collect metadata, such as origin, gender, age or skin type, which would allow us to check whether the datasets are inclusive and balanced. In dermatology, most public datasets do not usually tag the origin of patients or their skin tone. When this information is included, studies consistently show that the black population is severely underrepresented. “There has been a growing awareness of the problem and developers of algorithms have tried to address these shortcomings. However, there is still work to be done to create representative training data for algorithms,” Adamson says.
The quality and quantity of available data also determines how well the algorithms work. “What made us improve our diagnostic efficiency was that we used our own imaging resources,” says Julián Conejo-Mir, professor and head of dermatology at the Virgen del Rocío University Hospital in Seville. Conejo-Mir and colleagues developed an artificial intelligence algorithm for skin cancer diagnosis and to identify the depth of melanoma, a parameter that is related to the aggressiveness of these tumours.
Its database, which includes images of nearly a thousand patients from the hospital in Seville and photographs from other repositories, has been used to design an algorithm, currently under research, with a 90% accuracy rate. But even in apparently successful systems like this one, it is difficult to train algorithms to recognise less frequent cases. This is precisely what happens with acral lentiginous melanoma, the most common skin cancer in the black population and the one Bob Marley died of when he was only 36 years old. This tumour is particularly treacherous because it appears in areas where people rarely look for suspicious lesions, such as the palms of the hands and feet or under the fingernails, as happened to Marley.
Every year, the dermatology service at the Virgen del Rocío University Hospital in Seville diagnoses around 150 cases of melanoma, of which only 2 or 3 are acral lesions. “We had to take it out of training, because we had very few cases and, if we joined it to the rest, it failed; if we separated it, we didn’t have a sufficient number of images,” says José Juan Pereyra Rodríguez, head of section at the Virgen del Rocío University Hospital in Seville.
This artificial intelligence, which is not used for clinical screening but for research purposes, cannot be applied to cases of acral lentiginous melanoma because they did not have enough data on this type of cancer to train the algorithm reliably. To achieve this, they would have needed about 50 years worth of locally available data, Pereyra Rodriguez estimates. “In our case, it’s as simple as saying: ‘Don’t use the algorithm for acral lesions, in general, because I haven’t trained it for that.’ That’s it; it’s a limitation,” he says.
“The theory says that if 90% of my population” corresponds to white skins, “I must train” with those types “because prevalence is also important when it comes to making decisions. I must train in my environment,” Pereyra Rodríguez says. In the case of systems developed abroad, hospitals should ideally evaluate the performance of the algorithms on their own patient groups. López-Rueda also calls for “re-training with local data” before implementing any artificial intelligence: “It’s very expensive for both the company and the hospital, but that’s what would really work.”
“Biased algorithms rob patients of the potential benefits of this revolutionary technology.”
Even in Spain, the characteristics of the population also vary depending on the postcode. “If I develop software in the Hospital Clínic [in the centre of Barcelona] and implement it in Bellvitge [in the suburbs], it won’t work for me. If I do it the other way around, it won’t work either,” López-Rueda says. The consequences of algorithmic biases can be truly disastrous: patients can be harmed by an incorrect diagnosis. “Biased algorithms rob patients of the potential benefits of this revolutionary technology,” Adamson says, who points to the root of the problem: “The problem isn’t with the algorithm but with the thought and care going into designing and developing the algorithms.”