Skip to content

Getting started

This section is the documentation hub for the PorosData document stack. The manuals for Parser, Processor, and Designer live here in pipeline order (raw extraction → quality preparation → structured delivery), together with shared setup pages that apply across stages.

Three-stage manuals

Work moves from PDFs and engine output to review-ready packages. Each stage has its own overview and reference pages under this same Getting started branch in the site navigation.

Stage Role Start here
Parser Extracts blocks, figures, and content lists into the Raw Database Parser overview · Dataset layout
Processor Cleans and stabilises lists into the Processed Database Processor overview · CLI reference · Configuration and runtime
Designer Exports tagged views, structure JSON, and multimodal indexes into the Designed Database Designer overview · Delivery standards · CLI reference

For a single narrative that walks the directories and artefacts from end to end, use End-to-end workflow.

Shared setup and examples

These pages focus on environment, first commands, and copyable scenarios. They often emphasise Processor and Designer CLIs because those packages ship on PyPI; Parser setup and dashboard commands are maintained in the upstream gen-sci-data repository (see the Parser overview).

  • Installation — Python environment, porosdata-processor, sanity checks
  • Quick StartTextCleaner, optional YAML config, batch cleaning CLI
  • Examples — single-document trial, batch runs, Processor → Designer handoff
  • End-to-end workflowRaw / Processed / Designed Database layout and what to inspect

Terminology and integration notes: Glossary · API Reference. Project home: PorosData home.