Journal of Medical Internet Research

Introduction

Across many health conditions, early diagnosis is crucial for successful treatment and long-term outcomes, potentially impacting survival and health care costs [-]. Early diagnosis depends on detection of signs and symptoms, presentation to health care professionals, and timely diagnostic tests. However, patients often present to health care weeks or months after the onset of symptoms, and a key reason for this delay is the time taken to interpret symptoms as potentially serious [,]. Thus, the understanding of timing of symptom perception, symptom interpretation, and help-seeking behavior is vital in efforts to encourage early diagnosis.

“Digital footprints” capture huge amounts information, including online activity, in-store purchases, wearable device data, and social media engagement []. These digital data may provide insights into symptoms, changes in behavior, and the decision to seek help [,], and using these data may tackle current research limitations in this area. To date, researchers studying time to presentation have relied on retrospective self-report from patients recently diagnosed with the condition under investigation []. These data are subject to recall bias and could be affected by hindsight from the recent diagnosis [,].

In the United Kingdom, there are almost 50 million health-related searches per year using Google []. Internet use data include internet search history, calendar data, bookmarks, contacts, emails, photos, reminders, profiles, and documents stored online. Given the richness of these data, there is growing interest to explore the use of internet search data and other digital footprints as digital signals of serious disease. Preliminary research has begun to examine whether data on internet search history could be used to predict diagnosis of cancers and mental health conditions, with promising results. Changes in the frequency, content, and tone of online searches were evident up to 60 days before suicide attempts or general practitioner referral for gynecological cancers, up to 21 weeks before pancreatic cancer diagnosis, and up to 1 year before psychiatric hospitalization [-].

Despite the promise of the digital signals, there are multiple questions that need to be explored regarding the acceptability of this kind of research for patients and the public, especially if research involves sharing internet use data on an individual level and linking with future health outcomes such as a cancer diagnosis. Previous studies have mixed results regarding people’s willingness to share different types of digital data over “traditional” types of health data such as biomedical samples or questionnaire data [,]. It is unclear if internet users are aware that these data exist, where they are stored, and how they could be used in research or if effective ways of explaining these issues to potential participants exist. This is key to informed consent.

Willingness to share might depend on several factors, including the type of data being shared, the way internet data are explained, the purpose of the research, privacy concerns, and sociodemographic factors []. If some groups are less willing to provide internet use data, findings from future research might not be representative and could thus perpetuate health inequalities. For example, gender, age, ethnicity, or socioeconomic status such as employment might influence willingness to share internet use data and other types of data such as location data or measures from health and fitness apps [-]. Further, the type of condition being studied and the relevance to an individual (eg, through personal history or experiences of friends and family being affected by a mental or physical health condition) can influence willingness to participate in medical research [], yet it is unknown if this applies to willingness to share internet use data.

This study therefore investigated public perspectives on the use of internet use data for medical research to identify key criteria that could enhance acceptability and avoid inequalities in engagement. Our research questions were exploratory:

How willing are the public to share different types of internet use data for early diagnosis research?Does willingness to share internet use data vary for different mental and physical health conditions?Does the way internet use data is explained impact willingness to share internet use data?Are sociodemographic characteristics, personal history of different mental and physical health conditions, familiarity with internet use data, and attitudes toward data sharing associated with willingness to share internet use data?
MethodsStudy Design

We used a cross-sectional design to conduct an online survey on willingness to share different types of internet use data. We randomly assigned participants using an automated algorithm based on 2 factors: (1) the medical condition about which participants are asked whether they would be willing to share their data (cancer, heart disease, and depression) and (2) the way internet use data was explained (written text plus a screenshot as a pictorial example of internet use data vs written text only). Cancer, depression, and heart disease were selected, as these are common conditions for which early diagnosis is key and there is emerging evidence for use of digital data in prediction models [-]. We reported the study in line with Journal Article Reporting Standards Quantitative Research Design (JARS-Quant) standards [].

Ethical Considerations

Ethical approval for research with human subjects was granted by the Queen Mary Ethics of Research Committee (reference number QME25.0924). Participants received the participant information sheet for download at the beginning of the survey and provided informed consent for participation and data sharing before they could access the survey. All data were collected anonymously. In line with Dynata’s company policies, participants received tokens through the loyalty programs set up through Dynata partnerships automatically upon survey completion, which could be redeemed for vouchers or gifts. Incentives were valued at approximately £0.75 (US $1.00). No identification of individual participants in any images of the manuscript or appendices is possible.

Participants and Procedure

Participants were adult members of the public (18 years or older), both with and without a personal history of cancer, heart disease, or depression. The survey was hosted on the online platform Qualtrics. Participants were recruited through Dynata from participants who had previously signed up to an online panel from July 1, 2025, to July 23, 2025. As the panel was English-speaking and conducted initial checks with new participants, we assumed all participants would be proficient in the English language. Before full recruitment, the survey was pilot tested with patient and public representatives (see the following sections). To ensure data quality, it was then released to 10% of the final population to prevent technical problems and unclear questions before full roll-out.

The survey was administered using quota sampling based on gender (approximately 50% male and 50% female), age (approximately 50% of the population over the age of 50 years), income (approximately 50% with disposable household income levels <£35,000), and ethnicity (maximum 85% of the population from a White ethnic group). A minimum of 300 participants with a history of cancer, heart disease, or depression was sought. We chose to sample for age to increase the proportion of older participants, who have a higher incidence of cancer and cardiovascular disease. Online survey panels often have a disproportionately higher share of younger respondents [], and we anticipated that younger participants would be more likely to take part in a survey on the topic of internet use and data.

Measures

The survey assessed willingness to share data for early diagnosis research, attitudes toward data sharing, familiarity about internet use data, and sociodemographics. All items and the pictorial example can be found in .

Willingness to Share Internet Use Data for Early Diagnosis Research and Attitudes Toward Data Sharing

The main outcome measure was willingness to share data, assessed as a single item adapted from Hirst et al [] with 4 answer options (definitely yes to definitely no). Where appropriate, this was recoded to a binary variable (yes, no) for analyses. The same question was used to assess willingness to share specific types of internet use data (eg, internet search history or online purchases), with an option to choose “Not applicable (I do not use this).”

Open-ended questions were used to ask participants what would increase their willingness to share internet use data, as well as their concerns about data sharing.

Attitudes toward data sharing were assessed as perceived benefits and concerns or worries using an adaptation of scales by Sanderson et al [] and Seltzer et al []. We used matrix questions to include perceived benefits and concerns and calculated mean scores (Cronbach α=0.96 for perceived benefits and 0.94 for concerns).

Familiarity With Internet Use

We assessed participants’ self-reported familiarity with internet use data by adapting the scale by Hirst et al [] on awareness about General Data Protection Regulation (GDPR). We asked whether participants had heard about internet use data before. Objective knowledge was assessed by asking what participants thought was part of internet use data from a list of examples (eg, internet search history as example of internet use data or content of emails as distractor). Correct answers were counted to calculate a knowledge score (range 1‐10).

Sociodemographics and Current Internet Use

For sociodemographic characteristics, we assessed age, gender, educational level, employment, marital status, ethnicity, and preferred language. Items were adapted from the DISTINCT project []. We assessed disposable household income (<£35,000 or >£35,000) and postcodes (to determine levels of deprivation via English indices of deprivation 2019 [], Postcode Deprivation Finder 2019 [], and Multiple Deprivation Measures 2017 Lookup Tool []). Finally, we measured self-reported history of cancer, cardiovascular disease, and depression and whether their relatives had a history of these conditions. Answer options were adapted from the Health Survey for England, 2021 [].

To assess current internet and online service use, we used 2 questions from the UK Digital Landscape survey [] about frequency and purpose of internet use.

Data Preparation

There were no missing data in the study variables such as willingness to share internet use data, self-reported familiarity with internet use data, and perceived benefits or concerns, as items were set to mandatory. In the sociodemographic data, <1% were missing, except for disposable household income (2.3%) and deprivation quintile (7.8%). Online surveys are open to fraudulent responses [,]; as a precaution, we introduced a number of steps to ensure credibility of the data. An attention check was added halfway through the survey, asking participants to select “strongly agree” with answer options appearing in a random order. Participants were excluded if they failed the attention check. We also excluded participants who were flagged by Qualtrics as both likely bot and fraudulent respondent (Recaptcha Score<0.5 and Fraud Score>30) or as duplicate (flagged by Qualtrics or on inspection of the data), as well as participants who answered the survey in less than 2 minutes, which was considered to be insufficient time to read through the participant information sheet and questionnaire items. Finally, we excluded participants with the common placeholder “lorem ipsum” text or questionable open answers (eg, “Banana”) on both open-ended questions.

Data Analysis

We descriptively report differences in the willingness to share internet use data for early diagnosis research using percentages of participants indicating they would be willing to share. We conducted χ2 tests for differences in proportions willing to share internet use data between medical conditions, the way internet use data were explained, and history of the condition. Logistic regression analysis was used to determine how sociodemographic characteristics, perceived benefits and concerns, knowledge, and familiarity with internet use data affect willingness to share internet use data, coded as a binary outcome. Separate regression models were run for willingness to share internet use data for each medical condition (cancer, heart disease, depression).

Responses to open-ended questions were analyzed using a thematic content analysis approach and reported alongside how willing participants were to share internet use data. Categories were developed by two authors (CD and WL) and discussed with the research team including patient and public involvement (PPI) representatives. Answers not in English (one Romanian and one Polish) were translated using Google Translate, as translated content fit the questions, showing that participants understood English but answered in a different preferred language.

The full syntax and the final datasets can be found on Open Science Framework (OSF) [].

Power Analysis

Based on previous research [], we made the conservative assumption that willingness to share internet use data would likely vary between 45% and 55% of participants being willing to share. For an a priori sample size calculation, we therefore assumed a difference of 0.1 in probabilities of being willing to share data or not for a binary factor (0.55 in one group, 0.45 in the other; odds ratio [OR] 1.49). To reach a power of 80% given an α-error level of .05, we determined a sample size of 786 participants per regression. As logistic regression analysis was repeated for each condition, we aimed for a sample size of 2358 participants (3 × 786). However, the sample sizes for the logistic regression analyses were lower than the targeted sample size, so analyses were repeated without the variable with the most missing data (level of deprivation; see the following sections) for sensitivity analysis.

Patient and Public Involvement

We presented initial ideas for survey items to 3 PPI representatives who were older than 55 years and from a minority background. They were recruited from the Centre for Cancer Screening, Prevention and Early Diagnosis PPI pool at Queen Mary University of London, which is a database of more than 150 people with an interest in contributing to research. They endorsed the research but asked to add questions about emotions (fear and embarrassment) to the items on attitudes toward data sharing. The group asked that the participant information sheet emphasized that data sharing was not required for participation in this study and to change the wording of some items to facilitate understanding. Following these changes, we piloted the survey with 2 PPI representatives in think-aloud sessions to ensure face validity []. The PPI group was also involved in interpreting the results and discussing implications for future research and practice.

ResultsParticipant Characteristics

Of the 2864 participants who gave informed consent and completed the survey, 474 were excluded: 365 failed the attention check, 32 were flagged as both likely bot and fraudulent respondent, 64 were duplicate respondents, 6 completed the survey in less than 2 minutes, and 7 gave questionable open answers. Hence, 2390 participants with a mean age of 50.2 (SD 16.86, range 18‐90) years were included in the analyses. Participant characteristics are provided in .

Table 1. Participant characteristics from a cross-sectional survey in 2025 conducted with UK participants recruited through an online survey panel (N=2390).CharacteristicResults, n (%)GenderMan1250 (52.3)Woman1135 (47.5)Nonbinary4 (0.2)Prefer not to say1 (<0.1)Employment statusEmployed (including self-employed)1433 (60)Not employed352 (14.7)Retired592 (24.8)Prefer not to say13 (0.5)Highest educational qualificationsNo formal qualifications73 (3.1)GCSEs or equivalent408 (17.1)AS, A level or equivalent296 (12.4)NVQ up to level 3 or equivalent256 (10.7)Qualification at degree level or above1250 (52.3)Apprenticeship51 (2.1)Other or equivalent unknown52 (2.2)Prefer not to say4 (0.2)Disposable household income (£)<35,0001206 (50.5)>35,0001127 (47.2)Prefer not to say57 (2.3)Deprivation quintile based on postcode1 (most deprived)464 (19.4)2522 (21.8)3425 (17.8)4385 (16.1)5 (least deprived)408 (17.1)Missing186 (7.8)Marital statusSingle662 (27.7)Married or in legal relationship1368 (57.2)Widowed, separated or divorced348 (14.6)Prefer not to say12 (0.5)EthnicityWhite1978 (82.8)Asian or Asian British127 (5.3)Black, Black British, Caribbean or African216 (9)Mixed or multiple ethnic groups51 (2.1)Other8 (0.3)Prefer not to say10 (0.4)Preferred languageEnglish2376 (99.4)Other12 (0.5)Prefer not to say2 (0.1)Personal history of medical conditionsCancer398 (16.7)Heart disease or vascular problems422 (17.7)Depression745 (31.2)None of the above1047 (43.8)Not sure/not formally diagnosed126 (5.3)Prefer not to say16 (0.7)Relatives’ history of medical conditionsCancer656 (27.4)Heart disease or vascular problems633 (26.5)Depression535 (22.4)None of the above1005 (42.1)Not sure/not formally diagnosed159 (6.7)Prefer not to say24 (1)KnowledgeInternet search history2143 (89.7)Internet browsing history2141 (89.6)Health app data1717 (63.5)Online banking details (incorrect)1021 (42.7)App store data1490 (62.3)Content of emails (incorrect)1050 (43.9)Purchases made online1650 (69)Location history1831 (76.6)Internet searches in “incognito” mode (incorrect)813 (34)Video streaming history1720 (72)Self-reported familiarity with internet use dataNot familiar792 (33.1)Heard about internet use data850 (35.6)Some knowledge512 (21.4)Extensive knowledge236 (9.9)

aGCSE: General Certificate of Secondary Education.

bAS: Advanced Subsidiary.

cNVQ: National Vocational Qualification.

dNumber and percentage of participants indicating this is part of internet use data.

Willingness to Share Internet Use Data

There were no differences in proportions of respondents willing to share internet use data between the 3 medical conditions (for early detection of cancer=77%; heart disease=74%; depression=74%). When asked about different types of data, participants were most willing to share their health app data (73%‐76%, 95% CI 69.8%‐79.1%), whereas participants were less willing to share other types of data, especially regarding purchases made online (52%‐57%, 95% CI 48.7%‐60.3%) and their location history (58%‐59%, 95% CI 54.2%‐62.2%). Details are provided in . There was variability in what participants understood to be a part of internet use data, with more than 40% of participants incorrectly believing that the content of their emails (1050/2390, 43.9%) and online banking history (1021/2390, 42.7%) would be included (see ).

Table 2. Willingness to share internet use data for research on different medical conditions from a cross-sectional survey in 2025 with UK participants recruited through an online survey panel.TypeWillingness to share internet use data for research on early detection of…Comparison, χ² (df)P valueCancerHeart diseaseDepressionTotal users, nUsers willing to share, n (%)95% CITotal users, nUsers willing to share, n (%)95% CITotal users, nUsers willing to share, n (%)95% CIInternet use data in general795615 (77.4)74.4‐80.3798587 (73.6)70.5‐76.6797592 (74.3)71.2‐77.34.64 (2,2390).18Internet search history779532 (68.3)65.0‐72.6777509 (65.5)62.2‐68.9787541 (68.7)65.5‐72.02.18 (2,2343).34Internet browsing history784519 (66.2)62.9‐69.5783493 (63.0)59.6‐66.4792535 (67.6)64.3‐70.83.87 (2,2359).14Health app data682498 (73.0)69.7‐76.4670508 (75.8)72.6‐79.1688503 (73.1)69.8‐76.41.78 (2,2040).41App store data730446 (61.1)57.6‐64.6732455 (62.2)58.6‐65.7748490 (65.5)62.1‐68.93.37 (2,2210).19Purchases made online770402 (52.2)48.7‐55.7777414 (53.3)49.8‐56.8788448 (56.9)53.4‐60.33.72 (2,2335).16Location history771453 (58.8)55.3‐62.2768443 (57.7)54.2‐61.2774447 (57.8)54.3‐61.20.23 (2,2313).89Video streaming history750463 (61.7)58.3‐65.2751471 (62.7)59.3‐66.2756499 (66.0)62.6‐69.43.26 (2,2257).20

aTotal numbers and percentages for willingness to share include valid participants who are using these services and answered, “definitely yes” and “probably yes.”

Participants were descriptively less willing to share their internet use data if a pictorial example of internet use history was provided alongside written text compared with written text only, but this did not reach statistical significance for research on any of the 3 medical conditions (see ).

Having a personal or family history of the condition being researched was not associated with willingness to share internet use history for that condition, with the exception of those with a personal history of depression. Participants who had a history of depression were more likely to be willing to share their internet use data for the early detection of depression (79.7%, 95% CI 74.5%‐84.8%) than participants without a history of depression (72.2%, 95% CI 67.8%‐76.5%; P=.03; ).

Table 3. Willingness to share internet use data in relation to how the internet search history was described and personal and family history of the medical condition being researched from a cross-sectional survey in 2025 with UK participants recruited through an online survey panel.CharacteristicWillingness to share internet use data for research on early detection of…CancerHeart diseaseDepressionN (%)95% CIComparison, χ² (df)P valueN (%)95% CIComparison, χ² (df)P valueN (%)95% CIComparison, χ² (df)P valueDescription of internet use3.56 (1,795).062.05 (1,798).152.02 (1,797).16Written description312 (80.2)76.2‐84.2298 (75.8)71.6‐80.1294 (76.6)72.3‐80.8Pictorial example and written description303 (74.6)76.2‐84.2289 (71.4)66.9‐75.8298 (72.2)67.8‐76.5Personal history of medical condition being researched3.60 (1,790).0580.36 (1,790).554.82 (1,794).03Yes116 (83.5)77.2‐89.7103 (75.7)68.4‐83.0188 (79.7)74.5‐84.8No495 (76.0)72.8‐79.3479 (73.2)69.8‐76.6403 (72.2)68.5‐76.0Family history of medical condition being researched3.17 (1,792).080.95 (1,789).333.17 (1,792).08Yes171 (81.8)76.6‐87.1159 (76.4)70.6‐82.3133 (76.9)70.5‐83.2No442 (75.8)72.3‐79.3424 (73.0)69.4‐76.6449 (73.4)69.9‐76.9

aTotal numbers and percentages for willingness to share include participants answering, “definitely yes” and “probably yes.”

Key Factors Associated With Willingness to Share Internet Use Data

Multivariable logistic regression analysis on willingness to share internet use data for early detection of cancer indicated that willingness to share was associated with perceived benefits and perceived concerns of data sharing. Higher endorsement of benefits of sharing data was associated with a greater willingness to share (OR 8.588, 95% CI 5.687‐12.970). A higher endorsement of concerns was associated with a lower willingness to share (OR 0.432, 95% CI 0.290‐0.643). Ethnicity was significantly associated with willingness to share, with individuals of Asian or Asian British ethnicity demonstrating a significantly lower willingness to share than White participants (OR 0.234, 95% CI 0.076‐0.723). No other demographic factors, including age, gender, employment, income, marital status, education level, personal or family history of cancer, or level of deprivation, were associated with a willingness to share in the multivariable regression analysis, although some (education, employment, ethnicity and income) had indicated associations in univariate analyses (see ). Factors related to digital literacy (knowledge about internet use data, the number of online services used, or self-reported familiarity with internet use data) were not significantly related to willingness to share internet use data for cancer research in the multivariable regression analysis.

Willingness to share internet use data for the early detection of heart disease and depression followed similar patterns, with perceived benefits being significantly associated with a higher willingness to share internet use data (heart disease: OR 5.692, 95% CI: 4.050‐7.998; depression: OR 8.850, 95% CI 5.998‐13.059) and perceived concerns being significantly associated with a lower willingness to share internet use data (heart disease: OR 0.424, 95% CI 0.292‐0.617; depression: OR 0.343, 95% CI 0.228‐0.516). For research on detection of heart disease, self-reported familiarity with internet use data was associated with a lower willingness to share internet use data (OR 0.740, 95% CI 0.561‐0.976). For research on the early detection of depression, older participants were less willing to share their data (OR 0.975, 95% CI 0.951‐0.999), and men were more likely to be willing to share than women (OR 2.615, 95% CI 1.511‐4.526).

All results are displayed in .

Sample sizes for the logistic regression analyses were lower than intended due to missing sociodemographic data, especially for levels of deprivation. We therefore conducted sensitivity analyses by repeating the logistic regression analyses without levels of deprivation. Results were similar to the initial logistic regression analysis (see ), except for depression, where age was no longer associated with willingness to share their data (OR 0.981, 95% CI 0.959‐1.003) and those with a higher knowledge score about internet use data were less willing to share internet use data (OR 0.842, 95% CI 0.723‐0.981). For early detection of cancer, Asian and Asian British were not significantly less likely to share their internet use data (OR 0.351, 95% CI 0.122‐1.011).

Table 4. Logistic regression on willingness to share internet use data for research on the early detection of cancer from a cross-sectional survey in 2025 with UK participants recruited through an online survey panel.VariableWillingness to share internet use data for research on early detection of…Cancer (n=670)Heart disease (n=646)Depression (n=673)Willing,
n (%)OR (95% CI)SEP valueWilling,
n (%)OR (95% CI)SEP valueWilling,
n (%)OR (95% CI)SEP valueDescription of internet useWritten description269 (80.5)1——246 (78.8)1——244 (76.0)1——Pictorial example and written description257 (76.5)0.617 (0.357-1.066)0.279.08243 (72.8)0.735 (0.450-1.198)0.250.22263 (74.7)1.200 (0.723-1.989)0.258.48GenderWomen240 (80.5)1——228 (74.8)1——224 (70.2)1——Men286 (76.9)0.980 (0.560-1.716)0.286.94261 (76.5)1.017 (0.607-1.704)0.263.95283 (79.9)2.615 (1.511-4.526)0.280<.001EducationDegree level or above296 (80.9)1——263 (79.7)1——294 (80.1)1——Basic education98 (72.1)0.590 (0.300-1.159)0.344.13101 (71.6)0.688 (0.343-1.380)0.36.2990 (67.7)0.614 (0.291-1.296)0.381.20Advanced education132 (78.6)0.754 (0.390-1.455)0.336.40125 (71.4)0.627 (0.347-1.130)0.301.12123 (71.1)0.689 (0.362-1.313)0.329.26EmploymentEmployed324 (83.7)1——297 (76.3)1——329 (78.5)1——Unemployed69 (67.6)0.730 (0.353-1.509)0.371.4060 (74.1)1.172 (0.518-2.656)0.417.7066 (75.0)1.303 (0.575-2.950)0.417.53Retired133 (73.5)1.194 (0.521-2.732)0.422.68132 (75.0)1.251 (0.583-2.687)0.390.57112 (67.5)1.770 (0.826-3.792)0.389.14Income (£)<35,000251 (74.5)1——254 (73.6)1——239 (72.2)1——>35,000275 (82.6)1.327 (0.703-2.503)0.324.38235 (78.1)0.826 (0.459-1.487)0.300.52268 (78.4)0.775 (0.406-1.479)0.330.44Marital statusMarried or in legal partnership310 (80.9)1——299 (79.7)1——314 (79.1)1——Single131 (74.4)1.250 (0.625-2.499)0.353.53123 (70.7)0.669 (0.364-1.229)0.310.20128 (69.6)1.020 (0.537-1.936)0.327.95Divorced/ widowed/ separated85 (76.6)1.057 (0.511-2.186)0.371.8867 (69.1)0.754 (0.362-1.571)0.374.4565 (70.7)1.294 (0.575-2.915)0.414.53EthnicityWhite445 (78.1)1——416 (75.0)1——408 (73.8)1——Asian or Asian British21 (65.6)0.234 (0.076-0.723)0.576.0122 (81.5)0.843 (0.231-3.084)0.661.8032 (80.0)1.277 (0.394-4.133)0.599.68Black or Black British48 (92.3)0.806 (0.189-3.431)0.739.7743 (82.7)0.478 (0.177-1.291)0.507.1558 (89.2)0.682 (0.225-2.065)0.565.50Mixed or multiple ethnic groups12 (75.0)0.251 (0.062-1.019)0.714.0538 (66.7)0.474 (0.098-2.283)0.802.359 (60.0)0.273 (0.063-1.177)0.745.08Level of deprivationIMD Q1 (most deprived)107 (80.5)1——107 (78.7)1——120 (78.4)1——IMD Q2130 (77.4)1.664 (0.766-3.613)0.396.20113 (75.8)1.116 (0.541-2.302)0.370.77114 (79.7)2.043 (0.910-4.583)0.412.08IMD Q393 (76.9)0.965 (0.417-2.234)0.428.9399 (72.3)1.153 (0.531-2.503)0.395.7293 (73.8)1.450 (0.638-3.295)0.419.38IMD Q494 (77.0)1.162 (0.495-2.729)0.436.7375 (75.0)1.868 (0.809-4.315)0.427.1483 (69.7)1.896 (0.811-4.432)0.433.14IMD Q5 (least deprived)102 (81.0)1.943 (0.793-4.763)0.457.1595 (76.6)1.155 (0.512-2.603)0.415.7397 (73.5)1.142 (0.496-2.628)0.425.76Personal history of condition of interestNo422 (77.3)1——395 (75.1)1——349 (73.0)1——Yes104 (83.9)1.526 (0.720-3.236)0.383.2794 (78.3)0.892 (0.444-1.794)0.357.75158 (81.0)1.345 (0.720-2.513)0.319.35Family history of condition of interestNo370 (76.8)1——352 (73.9)1——390 (74.4)1——Yes156 (83.0)1.189 (0.651-2.171)0.307.57137 (80.6)1.037 (0.564-1.907)0.311.91117 (78.5)1.458 (0.758-2.804)0.334.26Age—0.983 (0.958-1.009)0.013.21—0.984 (0.962-1.008)0.012.18—0.975 (0.951-0.999)0.012.04Perceived benefits about sharing internet use data—8.588 (5.687-12.970)0.210<.001—5.692 (4.050-7.998)0.174<.001—8.850 (5.998-13.059)0.198<.001Perceived concerns about sharing internet use data—0.432 (0.290-0.643)0.203<.001—0.424 (0.292-0.617)0.191<.001—0.343 (0.228-0.516)0.208<.001Knowledge about internet use data (score)—1.061 (0.906-1.242)0.080.46—1.011 (0.866-1.180)0.079.891—0.858 (0.727-1.012)0.084.07Number of online services used—0.923 (0.826-1.031)0.057.16—1.040 (0.942-1.148)0.051.440—1.033 (0.929-1.149)0.054.55Familiarity with internet use data—1.108 (0.814-1.509)0.158.52—0.740 (0.561-0.976)0.141.033—0.854 (0.625-1.167)0.160.32

aNagelkerke R=0.573.

bNagelkerke R=0.516

cNagelkerke R=0.588.

dOR: odds ratio.

eNot applicable.

fIMD: index of multiple deprivation.

Statements on Willingness to Share and Concerns

The open-ended question on what would make them more willing to share their internet use data was answered by 2307 (2307/2390, 96.5%) participants. Percentages of statements relative to numbers of those definitely willing, probably willing, probably unwilling, and definitely unwilling to share are presented in . The main suggestions given by those currently not willing to share were strong assurance of data security (66/596, 11.1%) and incentives (37/596, 6.2%); however, the majority of these participants stated “nothing” would make them more willing (definitely not: 189/236, 80.1%; probably not: 196/360, 54.4%). Those who were “probably” or “definitely” willing to share their internet use data offered a wider range of suggestions, including a sense of contribution to research and society (“helping others”: 480/1794, 26.8%); data security assurances (211/1794, 11.8%); clarification of research purposes (157/1794, 8.6%); incentives (110/1794, 6.1%); strict governance, personal control, and building trust (106/1794, 5.9%); and personal relevance (147/1794, 8.2%).

‎Figure 1. Suggestions to increase willingness to share internet use data from a cross-sectional survey in 2025 with UK participants recruited through an online survey panel. Statements could be coded as multiple categories. Participants saying nothing would make them more willing are not included in this figure.

The question about concerns surrounding sharing internet use data was answered by 2318 (2318/2390, 97%) participants (). Concerns of participants who were “probably not” or “definitely not” willing to share their internet use data mostly focused on data security and the invasion of privacy (160/596, 26.8%) and concerns about data misuse, including sharing with private (marketing) companies (39/596, 6.5%). Participants also questioned the relevance of their internet use data for early detection of disease (76/596, 12.8%). Participants who were “probably” or “definitely” willing to share their internet use data noted similar concerns, although many indicated no concerns (probably willing: 368/716, 51.4%; definitely willing: 794/1078, 73.7%). Examples are provided in .

‎Figure 2. Concerns about sharing internet use data from a cross-sectional survey in 2025 with UK participants recruited through an online survey panel. Statements could be coded as multiple categories. Participants saying they had no concerns are not included in this figure.
DiscussionPrincipal Findings

This study examined willingness to share internet use data for research into early detection of cancer, heart disease, and depression. Participants showed a similarly high willingness to share internet use data for all conditions. Willingness differed by data type, being lowest for online purchases and location history and highest for health app data. Viewing a screenshot of browsing history did not significantly affect willingness for any condition.

Perceived benefits and concerns were key factors associated with willingness across all conditions. Answering open questions about willingness to share, participants reported that contributing to research, personal relevance, data security, incentives, clear research purposes, concrete examples, strong governance, and trust in researchers would improve their willingness to share. In contrast, they reported that privacy concerns, fear of misuse, and low relevance could reduce willingness to share. Those who were more familiar with internet use data were less likely to share data for heart disease detection, but this was not associated with willingness to share data for research on cancer or depression. There were few sociodemographic factors associated with willingness to share, but some did exist: Asian and Asian British participants were less willing to share their internet use data for cancer detection (not replicated in the sensitivity analyses), while younger (not replicated in the sensitivity analyses) and male participants were more willing to share data for depression detection.

Comparison With Prior Work

We found a surprisingly high willingness to share internet use data for health research compared with recent studies. For example, Grande et al [] classified approximately 13% of their participants as “agreeable to sharing,” whereas about 45% of our participants indicated they were “definitely” willing to share. Similarly, Hirst et al [] found that 61% of participants were willing to share their data with research organizations, and only 43% were willing to share with private organizations, compared with 75% in this study. Although we did not differentiate the kind of organization data would be shared with, we consistently found that a higher proportion of participants were willing to share across different types of data than in these studies. Our proportions are based on users of the respective types of data. However, willingness to share was similar when participants were asked more generally about “internet use data.”

As we used a similar recruitment strategy as previous studies [], it seems that the purpose of the data sharing described in our study may be responsible for the higher willingness to share internet use data. In our survey, we asked participants specifically about sharing their internet use data for the early detection of health conditions. Previous research has found public support for screening and early detection [,], especially if personally relevant []. This is line with open-ended comments regarding increased willingness to share if there were personal benefits (eg, receiving a quicker diagnosis). Beyond personal benefits, participants in our study who were willing to share their internet use data described a general sense of contribution to research and society, which our PPI group summarized as an act of “charity” in terms of “donating data.” The Smart Data Research UK report [] reports how the public grew more enthusiastic about using smart data in research for the “public good” when learning and discussing potential applications. However, information about the value of data sharing, as well as ensuring personal control over sharing potentially sensitive information is crucial to ensure acceptability of sharing data [], which is reflected in our findings.

Another suggestion for improving willingness to share internet use data in our study was monetary incentives, which was not captured in our quantitative scales but reported in the free text. Even participants who were currently unwilling to share mentioned incentives as potentially increasing their willingness to share. This indicates a “trade-off” between privacy and payment, although the exact value of individual privacy is difficult to determine [].

Our findings regarding higher willingness to share health app data than to share other types of internet use data are in line with previous research []. For example, Ackermann et al [] found that a better “intuitive match” between the type of data requested and what the data are used for increases willingness to share internet use data. In this survey study, this would be the case for health app data but not necessarily for online purchases or location data, which participants were less willing to share. However, researchers developing risk prediction models based on or including internet use data might rely on changes in online behavior beyond concrete health app data (eg, if an individual repeatedly searches for nonspecific symptoms or home remedies). As individuals are likely to look up symptoms online or change the tone and semantics of their searches long before visiting their general practitioner, capturing these early changes might benefit early detection [-].

Types of data that are not directly related to medical contexts might also be seen as more personal and more likely to be misused, such as by marketing companies []. Concerns about privacy and misuse of data were associated with a lower willingness to share across all conditions in this study. They were commonly endorsed and reported in both the quantitative and qualitative part of the survey, even by participants who were “definitely” willing to share. The Smart Data Research UK report [] also shows that the public is skeptical about private sector motivation and concerned about potential inequalities and data security.

Implications for Future Research and Practice

There is a clear need for transparent and open communication around the purpose of using internet data for early detection research, as well as data processing after it is shared, including which data, how, and why they are collected and stored [,]. Researchers and data scientists need to work toward clarifying which data are needed for what purpose, how to limit data sharing to what is necessary and filter out any other information, and how to communicate purpose and process of data sharing, especially concerning minority groups [,]. We found no effect of showing participants a pictorial example of internet use data on willingness to share. Our PPI group noted that the example only included a generic internet history without particularly sensitive data (eg, regarding financial details or a child’s local school). Future research could examine the effect of different types of examples, preferably using participants’ own internet use data.

Only 10% of participants in our sample noted that they knew a lot about internet use data, and 34% to 44% mistakenly believed that internet searches in “incognito” mode, online banking details, and content of emails are part of internet use data. This indicates that internet users might not be well-informed about the personal data they might be sharing online [,]. There is a need for more public information and education around what internet use data entail and what would need to be accessed for medical research to facilitate informed consent []. Our findings suggest that higher familiarity and, potentially, knowledge might negatively impact willingness to share internet use data for medical research. However, this should be interpreted with caution, as univariate analyses indicated the opposite trend. Actual familiarity and knowledge about internet use data and the consequences for willingness to share data should be investigated in more detail. Clear communication including which data are shared with whom and ensuring personal control are central to enable informed participation in internet research [].

Our findings show how concerns around sharing internet use data might be more pronounced in some ethnic minority groups and corroborate research showing that cancer is associated with fear and stigma in minority communities []. Future research should focus on mitigation strategies to avoid inequalities in this new stream of internet-based medical research, especially for vulnerable populations. Stigma around mental health conditions might drive the findings that younger age and male gender were associated with a higher willingness to share internet use data for the detection of depression []. As traditional norms of masculinity can reduce symptom expression and inhibit help-seeking for mental health conditions [,], sharing internet use data might be a more anonymous way to disclose potential symptoms.

Although many participants reported having concerns about sharing internet use data, a high proportion did not specify their concerns. This might have been enforced by the wording of the open-ended question around concerns but could also reflect a general unease with data sharing due to a lack of knowledge and transparency []. A qualitative study identified similar concerns around data security and misuse [] as well as actions taken by individuals to protect their privacy, such as use of incognito mode for sensitive searches, that could impact future research. Future research, especially qualitative research, should examine how these concerns can be mitigated in more detail. Additionally, those who reported being “definitely not” willing to share data often said “nothing” would make them more willing. Future research should therefore investigate why some people would not be willing to share internet use data under any circumstances.

Strengths and Limitations

This study is a large survey with population-based quotas that builds on a previous qualitative study [], previously used measures [,,], and extensive patient and public input. However, the survey assessed self-reported willingness to share rather than the actual behavior, as well as self-reported diagnosis, which might have led to an overestimation of those with a confirmed clinical diagnosis, especially in those reporting to have depression []. It remains to be seen whether these results will be confirmed in future research and practice when shared internet use data are required.

We recruited through an online panel and therefore included participants who are comfortable answering online research surveys for incentives and attracted those with higher education (approximately 52% with at least a degree level of education, compared with the UK average of 33%‐34%, []). This might have contributed to the high willingness to share internet use data and frequent mentioning of incentives to increase willingness in the open-ended questions. Nevertheless, internet use data can only be shared by people who are using the internet in the first place, and internet use and knowledge about internet use data varied in the sample, suggesting the sample represents a range of internet users. The power for some of the logistic regression analyses might have been too low to detect differences. Although we ensured a representative percentage of the sample (approximately 17%) was from ethnic minority groups, the numbers within minority ethnic groups limit accurate comparisons.

Other potential methodological limitations due to the short survey design include the use of area-based index of multiple deprivation quintiles for individual-level data, potentially leading to ecological fallacy as individuals living in affluent areas may still experience deprivation and vice versa []. Nevertheless, it is still the most comprehensible method that could be applied in a short questionnaire, and sensitivity analyses were used to replicate logistic regression analyses without this deprivation variable []. We combined those “definitely” and “probably” (not) willing to share for analyses, although the groups might differ. Dichotomizing questions about willingness to share internet use data might have reduced the nuance of the data.

Finally, we had to exclude a substantial proportion (16.6%) of responses due to low data quality, including those flagged as bots and fraudulent responses and those failing attention checks. This does not negatively impact the quality of our final sample, as we were able to exclude and replace these participants during recruitment. However, this shows that researchers will need to account for “impostor” responses when planning for sample sizes for online surveys, implement strategies to detect low quality data (eg, attention checks or bot detection), and carefully check responses, leading to additional workload. If low-quality responses are not addressed, we may not be able to trust similar research results [].

Conclusion

This study is innovative as the first large-scale survey, to the best of our knowledge, to examine public willingness to share internet use data for early detection research across multiple common health conditions while assessing perceived benefits, risks, and sociodemographic differences. This survey indicated willingness to share internet use data is high across research on early detection of different common health conditions. Willingness to share was associated with perceived benefits for oneself and society in general and limited by concerns around data privacy and potential for misuse. Participants were more willing to share data that were plausibly related to the purpose of early detection of health conditions (eg, health app data).

The survey results advance the understanding of participation in research using internet use data by identifying both barriers and enablers of data sharing. Although sociodemographic differences were limited, variations by age, gender, and ethnicity suggest potential inequities. The findings inform consent design, communication strategies, and data governance, highlighting the need for clear purpose limitation and robust data protection to foster trust and reduce inequalities. They emphasize the need to clarify which and how data are used for research into early detection and investigate how communication about internet use data could improve willingness to share while avoiding perpetuating inequalities. Data sharing needs to be limited to specific research purposes, and resolute data protection strategies need to be put in place.

We would like to thank our patient and public involvement (PPI) group for their valuable inputs during this project and for reviewing and testing the survey. We would also like to thank our participants for their time and interest in this study.

The Data Liberation Front (based at Google) was a collaborator on the study and advised on accurate description about internet use data. The research team members (CD, MR, SES) were the sole decision-makers on research design and outputs.

Google provided unrestricted funding for this research through a Privacy Research Faculty Award to SES. The funder had no role in the study design; collection, analysis, or interpretation of data; writing of the paper; and/or decision to submit for publication. SES is supported by Barts Charity (G-001520; MRC&U0036).

The full syntax and the final datasets can be found on OSF [].

SES acquired funding. SES, CD, MR and SA contributed to conceptualization. SES and CD were responsible for project administration. CD undertook the data collection and analysis, with input from SES, MR and WL. CD drafted the original manuscript. All authors contributed to writing – review and editing.

None declared.

Edited by Stefano Brini; submitted 10.Oct.2025; peer-reviewed by Natalia Calanzani, Nicola Howe; final revised version received 07.Mar.2026; accepted 07.Mar.2026; published 25.Mar.2026.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Journal of Medical Internet Research

Tags: