Decoding Cognitive States: Using Transformers to Find the Linguistic Fingerprints of Reading Effort

Abstract

Can the way we use language reveal the cognitive effort we’re expending? This project investigates whether it’s possible to distinguish between two different cognitive states—Normal Reading (NR) and Task-Specific Reading (TSR)—using only the linguistic properties of a text. Starting with classical machine learning models that achieved a baseline F1-score of ~0.74, we developed a novel hybrid approach. By combining hand-crafted syntactic features with powerful embeddings from a fine-tuned BERT transformer, our final MLP-based model successfully classified the reading state with an F1-score of 0.9474, demonstrating a strong, quantifiable link between language patterns and cognitive load.

Try the Interactive Demo!

The Scientific Question: Can Text Reveal the Mind’s State?

The human brain is not a static processor; it adapts its strategy based on the task at hand. When we read casually, our cognitive state is different from when we read to find a specific piece of information. This latter state, known as Task-Specific Reading (TSR), involves a higher degree of attention and cognitive effort.

Using the comprehensive ZuCo 2.0 dataset, which uniquely pairs text with corresponding EEG and eye-tracking data, our research posed a challenging question: Can we bypass the direct neural recordings and identify the cognitive state of the reader purely from the text they are processing? In other words, does increased cognitive effort leave a detectable “fingerprint” on the linguistic and structural properties of language?

2. Initial Approach: Classical Models and Feature Engineering

Our first phase involved a systematic evaluation of classical machine learning models (such as Logistic Regression, RandomForest, and SVM). We explored two distinct feature engineering paths:

A) LLM-based Embeddings: We used a pre-trained sentence transformer (all-MiniLM-L6-v2) to create dense vector representations of each sentence, capturing its semantic meaning.
B) Discrete Linguistic Features: We engineered a rich set of features, including readability scores (e.g., Flesch Reading Ease), lexical diversity (e.g., TTR), and advanced syntactic complexity metrics derived using spaCy (e.g., dependency distance, POS tag counts).

These initial models performed reasonably well, with the RandomForest classifier using enhanced discrete features achieving the best baseline F1-score of approximately 0.74 for the TSR class. This confirmed that linguistic features hold predictive power, but we believed a higher accuracy was achievable.

3. The Core Innovation: Fine-Tuning a Transformer for Task-Specific Embeddings

The key limitation of the initial approach was that both the off-the-shelf embeddings and the classical models were “task-agnostic.” To reach the next level of performance, we needed representations that were specifically attuned to the nuances of our dataset.

The breakthrough came from fine-tuning a bert-base-uncased model. Using the powerful Hugging Face transformers library and PyTorch, we re-trained the model on our specific NR/TSR classification task. This process, validated robustly using 5-fold Stratified Cross-Validation, transformed the general-purpose BERT into a specialist. The embeddings extracted from this fine-tuned model were no longer just general representations of language; they were highly discriminative features, optimized to separate the two cognitive states.

4. The Hybrid Model: Combining Learned Representations with Explicit Rules

Our final and most successful model architecture was a hybrid one. We hypothesized that combining the deep, contextual understanding of the fine-tuned BERT with the explicit, rule-based nature of our hand-crafted linguistic features would yield the best results.

We concatenated the two feature sets:

Fine-Tuned Embeddings (768 dimensions): Capturing the semantic and contextual essence of the sentence.
Scaled Enhanced Discrete Features (~19 dimensions): Capturing explicit rules about syntax, readability, and lexical diversity.

This combined feature vector was then fed into an advanced Multi-Layer Perceptron (MLP), architected in PyTorch with regularization techniques like Dropout and BatchNorm to prevent overfitting.

5. Results & Conclusion: A Resounding Success

The final hybrid model was evaluated on a held-out test set, achieving a remarkable F1-score of 0.9474 for the TSR class.

This significant leap in performance from our baseline models provides a strong answer to our initial research question. Yes, cognitive effort leaves a quantifiable and highly predictive fingerprint in our use of language. This finding has exciting implications, suggesting a future where non-invasive, text-based digital biomarkers could be developed to assess cognitive load, attention, or even mental fatigue.

Technology Stack & Code

This research utilized a modern, robust stack of data science and AI tools.

Core Libraries: Pandas, NumPy, Scikit-learn, NLTK, spaCy, Textstat
Deep Learning: PyTorch, Hugging Face Transformers
Models: Logistic Regression, RandomForest, SVM, LightGBM, MLP, BERT
Methodology: Stratified K-Fold Cross-Validation, Fine-Tuning, Feature Engineering

The complete methodology, code, and analysis are available for review in the project’s Jupyter Notebook.

View the Project on GitHub!