My efforts in the area of AI SEA (Safety, Ethics and Alignment) have broadly been in these two areas:
Red-Teaming + Adversarial attacks and Dataset auditing + bias mitigation and Model auditing.
A sampling of my research publications spanning these two areas : Model auditing, Red-Teaming and Adversarial attacks
Model auditing, Red-Teaming and Adversarial attacks
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
Model weight theft with just noise inputs: The curious case of the petulant attacker
Understanding adversarial robustness through loss landscape geometries
Vulnerability of deep learning-based gait biometric recognition to adversarial perturbations
On detecting adversarial inputs with entropy of saliency maps
Art-attack! on style transfers with textures, label categories and adversarial examples
Smile in the face of adversity much? A print based spoofing attack
OODles of ODDs: The landscape of Out-of-distribution vulnerabilities of vision models
Did They Direct the Violence or Admonish It? A Cautionary Tale on Contronomy, Androcentrism and Back-Translation Foibles [Video]
Dataset curation, auditing and bias mitigation
PS: My more recent AI safety chronicles can be found here