We trained our models on a dataset of paired human-written and AI-generated text. Our human-written text spans student-written articles, news articles, as well as question and answer datasets spanning multiple disciplines in the sciences and humanities. For each article of human-written text, we generate corresponding articles with AI to ensure there isn't topic-level bias in our dataset. Finally, we train our model with an equal balance of human and AI-written articles.
We classify 99% of the human-written articles correctly, and 85% of the AI-generated articles correctly, when we set a threshold of 0.88 on the completely_generated_prob
returned by our API (human if below 0.88, AI if above 0.88). Our classifier achieves an AUC score (definition) of 0.98.