I've been handed a legacy app with a mysql database to extend/upgrade. The system is a mess. Lab tests are stored in a single table, with over 200 columns containing marker values. The labtest table includes the DateTestCollected. A separate table holds patient demographics, including DateOfBirth and Sex.
In the past, we scanned lab results and identified possible abnormalities based on one or more markers. The same cutoff values were used regardless of age or sex. So -- SELECT COUNT(*) FROM labtest WHERE marker_1>$m_1_cutoff_high OR marker_1<$m_1_cutoff_low; Even with the less than ideal schema, indexes on the marker columns made this kind of query work relatively well.
New algorithms are being put into place. Instead of just looking at one or two markers, we now must look at the age & sex to determine which cutoff values to use, then apply our algorithm. I now have 12 possible variations of the same algorithm.
( age_rage='0-9' AND sex='m' AND ( marker_1 < 876 AND marker_1 > 345) ) OR
( age_rage='10-19' AND sex='m' AND ( marker_1 < 824 AND marker_1 > 312) ) OR
( age_rage='20-29' AND sex='m' AND ( marker_1 < 798 AND marker_1 > 311) )
) OR (
( age_rage='0-9' AND sex='f' AND ( marker_1 < 987 AND marker_1 > 465) ) OR
( age_rage='10-19' AND sex='f' AND ( marker_1 < 813 AND marker_1 > 404) ) OR
( age_rage='20-29' AND sex='f' AND ( marker_1 < 701 AND marker_1 > 209) )
) OR (
/* ect */
There are actually 12 variations to check against. This query is very slow, especially when iterating through 200+ markers (200 x 12 variations). EXPLAIN showed that MySQL is scanning all rows (50,000) - as soon as I applied a multi-column index (date_of_test, marker_1) MySQL stopped scanning the entire table.
Under the current schema, each marker, age, sex, and date_of_test column is indexed separately (no multi-column indexes). There are limits to the number of indexes per table.
So that's where I am. I have a legacy system in need of work, but I'm not sure of the best, most efficient way of approaching the problem.
I thought of converting the labtest table to something like (labtest_id, marker_id, marker_value), but that alone doesn't solve the issue of having to find the age & sex of each patient to determine the correct marker cutoff value.
What would be the least painful way of working through this?