Skip to content

refactor: consolidate syntax crates, introduce type syntax & core syntax #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 9, 2025

Conversation

azjezz
Copy link
Member

@azjezz azjezz commented Apr 9, 2025

📌 What Does This PR Do?

This PR introduces a significant refactoring of the crates related to syntax processing within the Mago workspace, aiming for better organization, improved maintainability, and clearer API boundaries.

🔍 Context & Motivation

Currently, several small crates (mago_ast, mago_token, mago_lexer, mago_parser, mago_walker, mago_ast_utils) handle different aspects of PHP syntax processing. These crates are tightly coupled:

  • The parser needs the lexer, AST definitions, and tokens.
  • The lexer needs token definitions.
  • The walker needs the AST.
  • AST utils need the AST. This strong interdependency means users often need to import most or all of these crates together, and internal changes can require updates across multiple places.

Additionally, the PHP docblock type parser, previously developed within mago_type_checker ( private for now ), is a valuable component that could be useful independently. Common utilities are also shared between the main PHP lexer/parser and this type parser.

🛠️ Summary of Changes

  • Consolidated mago_syntax Crate:

    • Merged the functionality of mago_ast, mago_ast_utils, mago_token, mago_lexer, mago_parser, and mago_walker into a single, unified crate named mago_syntax.
    • This crate now serves as the primary interface for PHP tokenization, parsing into an AST, AST definitions, and AST traversal utilities.
    • Rationale: Reflects the practical reality that these components are almost always used together. Simplifies dependency management for consumers of the PHP parser.
  • New mago_type_syntax Crate:

    • Introduced a new crate dedicated to parsing PHP docblock type strings.
    • Contains the type-specific lexer (TypeLexer), token definitions (TypeToken, TypeTokenKind), parser, AST, and error types.
    • Extracted and refined from the work-in-progress mago_type_checker.
    • Rationale: Exposes the type parser as a reusable component for other tools that might need to understand docblock types without needing the full PHP parser or type checker.
  • New mago_syntax_core Crate:

    • Introduced a new utility crate to hold common, low-level components shared between different lexing/parsing tasks.
    • Contains shared fundamentals like lexer input handling (e.g., the Input struct), position/span management helpers (if not already in a dedicated span crate), error primitives, and common lexing utilities (e.g., read_digits_of_base, character classification macros/functions).
    • Rationale: Avoids code duplication between mago_syntax and mago_type_syntax by providing a shared foundation (DRY principle).

📂 Affected Areas

  • Linter
  • Formatter
  • CLI
  • Composer Plugin
  • Dependencies
  • Documentation
  • Other (please specify): Parser, Lexer, Ast, Walker, Ast Utilities, and Token.

🔗 Related Issues or PRs

📝 Notes for Reviewers

@azjezz azjezz added Priority: Critical This should be dealt with ASAP. Not fixing would be a serious error. Status: Completed Nothing further to do; awaiting closure out of politeness. Subject: Dependencies Pull requests that update a dependency file. Type: BC Break A change that introduces backward compatibility breaks in the public API. Type: Enhancement Request for additions or changes that improve existing functionality. Subject: Formatter An issue or PR related to the formatter. Subject: Linter An issue or PR related to the linter. Subject: Parser An issue or PR related to the parser, lexer, or ast. labels Apr 9, 2025
@azjezz azjezz self-assigned this Apr 9, 2025
@azjezz azjezz merged commit 3285a12 into main Apr 9, 2025
4 checks passed
@azjezz azjezz deleted the syntax branch April 9, 2025 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Critical This should be dealt with ASAP. Not fixing would be a serious error. Status: Completed Nothing further to do; awaiting closure out of politeness. Subject: Dependencies Pull requests that update a dependency file. Subject: Formatter An issue or PR related to the formatter. Subject: Linter An issue or PR related to the linter. Subject: Parser An issue or PR related to the parser, lexer, or ast. Type: BC Break A change that introduces backward compatibility breaks in the public API. Type: Enhancement Request for additions or changes that improve existing functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant