Research infrastructurev0.9 invite-only

From qualitative
narratives to citable
research data.

NarraGrid is a collaborative platform for turning interviews, memories, and open-ended responses into structured datasets - with prompts, benchmarks, and findings you can cite.

Request access Browse public artifacts

used by 14 research labs3,184 cases scored83 external citations

prompttr_mem_valence . v1.2

I remembered the summer kitchen at my grandmother's house, the smell of pickled vine leaves, and how she would hum while folding them.

↓ claude-sonnet-4.5

{

"valence": 2.4,

"confidence": 0.88,

"rationale": "warm sensory recall"

}

The pipeline

Five steps from raw text to a dataset.

The workflow preserves the method around every result: the source text, prompt version, model provider, validation rules, and human benchmark.

Upload narratives

CSV, Excel, open-ended survey responses, interview excerpts, or field notes.

Define the framework

Capture codebooks, rubrics, rating scales, and exclusion criteria as method.

Version the prompt

Build a structured instrument with schema validation and change history.

Score with models

Run GPT, Claude, Gemini, or local models against the same research task.

Benchmark with humans

Compare against expert raters and report r, kappa, F1, MAE, and n.

Public artifacts

Prompts, benchmarks, and findings that can be cited.

Treat the research instrument as a first-class object. Publish the prompt, benchmark, or result with provenance attached.

promptpublic / v1.2

Turkish autobiographical memory valence rating

Kavdir, A. (2026). Turkish autobiographical memory valence prompt, v1.2. NarraGrid.

Rates each memory on a -3 to +3 scale with rationale. Validated against expert ratings.

r: .724
kappa: .81
cited: 47x

benchmarkpublic / v1.0

Self-defining memory specificity, three class

Narrative Identity Lab. (2026). Specificity benchmark, v1.0. NarraGrid.

Compares four LLMs against expert-coded memories using a shared codebook.

F1: .81
kappa: .76
n: 200

findingpublic / v1.1

Model comparison on Turkish narrative valence

Kavdir, A. and Celik, M. (2026). Turkish valence model comparison, v1.1. NarraGrid.

A same-prompt comparison of Claude, GPT, and Gemini against human ratings.

models: 3
delta r: .14
runs: 18

For research labs

A workflow built for methods sections.

structure

Turn text into variables.

Score narrative material on valence, specificity, agency, coherence, trust, or your own rubric.
Export datasets with variable names, codebook notes, and provenance attached.

benchmark

Compare LLMs like raters.

Run multiple models on the same task and inspect where their judgments diverge.
Calibrate against expert ratings and report the measurement behind every claim.

publish

Cite methods directly.

Publish prompts, benchmarks, datasets, and findings as stable versioned artifacts.
Let other labs reuse the instrument instead of reverse-engineering your appendix.

Worked example

A prompt can carry evidence, not just instructions.

Every artifact keeps the receipt: which model, which dataset, which evaluation, and which version produced the result.

See prompts Open benchmarks

Disciplines

Where open text becomes evidence.

PsychologySociologyPolitical scienceAnthropologyEducationCommunication studiesPublic healthPolicy researchOrganizational researchHumanitiesSurvey methodologyMarket research

Bring a dataset.
Leave with a method other
labs can cite.

Request access Talk to research team

early access . academic pricing . institutional invoicing available