When reading the healthcare artificial intelligence and big data literature, I often run across the implied premise that de-identification of patient data makes its use acceptable for large medical studies.

Although historically with studies involving only hundreds of patients this seems quite reasonable, I do wonder if this applies to data sets involving 1 million patients – typical numbers for deep learning Artifical Intelligence studies.

When a patient joins a healthcare network and agrees to have an electronic health record, did they really agree to share every single line in every single element of their electronic health record for studies about which they are unaware?

Did they really give informed consent?

I am not sure of the final answer to these questions, but I am sure these are questions worth asking.