Getting started¶

This section is the documentation hub for the PorosData document stack. The manuals for Parser, Processor, and Designer live here in pipeline order (raw extraction → quality preparation → structured delivery), together with shared setup pages that apply across stages.

Three-stage manuals¶

Work moves from PDFs and engine output to review-ready packages. Each stage has its own overview and reference pages under this same Getting started branch in the site navigation.

Stage	Role	Start here
Parser	Extracts blocks, figures, and content lists into the Raw Database	Parser overview · Dataset layout
Processor	Cleans and stabilises lists into the Processed Database	Processor overview · CLI reference · Configuration and runtime
Designer	Exports tagged views, structure JSON, and multimodal indexes into the Designed Database	Designer overview · Delivery standards · CLI reference

For a single narrative that walks the directories and artefacts from end to end, use End-to-end workflow.

Shared setup and examples¶

These pages focus on environment, first commands, and copyable scenarios. They often emphasise Processor and Designer CLIs because those packages ship on PyPI; Parser setup and dashboard commands are maintained in the upstream gen-sci-data repository (see the Parser overview).

Installation — Python environment, porosdata-processor, sanity checks
Quick Start — TextCleaner, optional YAML config, batch cleaning CLI
Examples — single-document trial, batch runs, Processor → Designer handoff
End-to-end workflow — Raw / Processed / Designed Database layout and what to inspect

Where to read next¶

Terminology and integration notes: Glossary · API Reference. Project home: PorosData home.