Skip to content

feat: state machine for scrape task producer state mgmt#115

Open
extreme4all wants to merge 4 commits intodevelopfrom
f/scrape-task-state-machine
Open

feat: state machine for scrape task producer state mgmt#115
extreme4all wants to merge 4 commits intodevelopfrom
f/scrape-task-state-machine

Conversation

@extreme4all
Copy link
Copy Markdown
Contributor

@extreme4all extreme4all commented Apr 10, 2026

replaces imperative FetchParams mutation with generic state machine. Good separation: sm.py (generic engine), structs.py (enums/context), states.py (transition declarations), core.py (orchestration). ~120 lines cut from core.

Summary

Refactor scrape task producer state management from mutable FetchParams to an explicit state machine pattern.

Changes

  • Add generic state machine (sm.py): StateMachine[S, E, C] with enum states/events, context object, decorator-based transition registration.
  • Extract domain types (structs.py): ScrapeState enum (NORMAL → POSSIBLE_BAN → CONFIRMED_BAN → DONE), ScrapeEvent enum (FETCH_MORE, REDUCE_DAYS, NEXT_STEP, NEW_DAY), ScraperCtx dataclass replacing FetchParams.
  • Declare transitions (states.py): All state transitions and side effects via @scraper_sm.transition decorators.
  • Simplify core (core.py): Replace determine_fetch_params (mutation) with determine_event (pure) + scraper_sm.handle (transition). Remove run_async/run wrappers, inline main() setup.
  • Rewrite tests: Test state transitions and event determination independently.

State flow

NORMAL → POSSIBLE_BAN → CONFIRMED_BAN → DONE
  ↑                                    |
  └──────────── NEW_DAY ───────────────┘

Each state narrows the date window (REDUCE_DAYS) until exhausted, then advances (NEXT_STEP). FETCH_MORE pages via cursor.

Notes

fetch_more uses three decorators (one per state → self) as a workaround for missing self-transition concept in the generic SM. Consider adding transition_to_self or allowing to_state=None to mean "stay."

@extreme4all extreme4all changed the title feat: implement state machine for scrape task state management and re… feat: state machine for scrape task producer state mgmt Apr 10, 2026
Copy link
Copy Markdown
Contributor

@RusticPotatoes RusticPotatoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My notes

possible_ban=ctx.possible_ban,
confirmed_ban=ctx.confirmed_ban,
or_none=state == ScrapeState.NORMAL,
first_date=ctx.first_date,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's ctx? Can we just spell it out or

fp.reset_for_new_day(max_days)
state = scraper_sm.handle(ctx, state, ScrapeEvent.NEW_DAY)

if state == ScrapeState.DONE:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach

transitions: dict[tuple[S, E], tuple[S, Action]] = field(default_factory=dict)

def add_transition(
self, from_state: S, event: E, to_state: S, func: Action
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't S from state and the same var S to state going to cause an issue?

# fmt: off
@scraper_sm.transition(ScrapeState.CONFIRMED_BAN, ScrapeEvent.NEXT_STEP, ScrapeState.DONE)
# fmt: on
def to_done(ctx: ScraperCtx) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is done no ban state?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants