Data Fix #7618
Updated by Ame Diphoko 11 months ago
h2. issue
On the EDC, in Flourish Caregiver > Ultrasound Form there is a variable `maternal_delivery_date`.
On the EDC, in Flourish Caregiver > Birth Form there is a variable `delivery_datetime`.
The values for these two variables should be equal across the CRFs, but the are not equal for 33 childpids.
For all 33 chilpids, the ultrasound form `maternal_delivery_date`is 1 day earlier than the Birth Form `delivery_date`
I confirmed that the all the values for the Birth Form `delivery_date` equal child dob across other CRFs in the study. So the value for delivery date in the Ultrasound form is the only source of discrepancy. Looking at report_datetime, the appearance of these discrepant values doesn't seem to correspond to any specific date range or segment of time.
Just a thought/possibility: because the difference value is 100% consistent across records this feels more like a systemic error than human error. Is it possible that a time zone value/attribute associated with ultrasound form delivery date value may be misaligned or mis-formatted upon export?
Kate also suggested date time settings specific to a certain computer or computing environment could also be source driving the difference.
h2. Priority and follow-up
Ame - we need to confirm if the variable `maternal_delivery_date` is the delivery date being used to compute gestational age for `ga_birth_usconfirm_us`. If so, we need prioritize addressing and correcting the values computed for gestational age since Kate is actively using this variable to support grants and publications.
h2. code to reproduce issue
see attachments
h2. attachments
- the 2 raw csv exports I used to find the issue.
- merged csv output with the 33 records containing anomalous values (also contains merged values for child dob from Flourish Child > Birth Data and Flourish Caregiver > Caregiver consent)
- code needed to reproduce the 33 records of issue both as py file and jupyter notebook (was not sure what your preference is!) description below:
On the EDC, in Flourish Caregiver > Ultrasound Form there is a variable `maternal_delivery_date`.
On the EDC, in Flourish Caregiver > Birth Form there is a variable `delivery_datetime`.
The values for these two variables should be equal across the CRFs, but the are not equal for 33 childpids.
For all 33 chilpids, the ultrasound form `maternal_delivery_date`is 1 day earlier than the Birth Form `delivery_date`
I confirmed that the all the values for the Birth Form `delivery_date` equal child dob across other CRFs in the study. So the value for delivery date in the Ultrasound form is the only source of discrepancy. Looking at report_datetime, the appearance of these discrepant values doesn't seem to correspond to any specific date range or segment of time.
Just a thought/possibility: because the difference value is 100% consistent across records this feels more like a systemic error than human error. Is it possible that a time zone value/attribute associated with ultrasound form delivery date value may be misaligned or mis-formatted upon export?
Kate also suggested date time settings specific to a certain computer or computing environment could also be source driving the difference.
h2. Priority and follow-up
Ame - we need to confirm if the variable `maternal_delivery_date` is the delivery date being used to compute gestational age for `ga_birth_usconfirm_us`. If so, we need prioritize addressing and correcting the values computed for gestational age since Kate is actively using this variable to support grants and publications.
h2. code to reproduce issue
see attachments
h2. attachments
- the 2 raw csv exports I used to find the issue.
- merged csv output with the 33 records containing anomalous values (also contains merged values for child dob from Flourish Child > Birth Data and Flourish Caregiver > Caregiver consent)
- code needed to reproduce the 33 records of issue both as py file and jupyter notebook (was not sure what your preference is!) description below: