The nature of facial features discussed here could be analogous to any dataset that has correlation but not neccesarily causation with desired predictive or classification output. I’m not advocating for this merely thinking aloud: while basing evaluations off of facial features alone clearly carries spector of prejudice or potential for injustice, would it be different if you combine facial features into some big data superset of other measures that each include some degree of correlating factors? Does that then make the resulting evaluations more just? Or should ML practititioners attempt to pair down training data to just those features that have expected causation relationship with desired evaluation to avoid potential for prejudice?