Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mergetag parsing #29

Merged
merged 2 commits into from
Sep 29, 2022
Merged

Mergetag parsing #29

merged 2 commits into from
Sep 29, 2022

Conversation

bk2204
Copy link
Member

@bk2204 bk2204 commented Sep 28, 2022

Git contains some multiline headers, including gpgsig, gpgsig-sha256, and mergetag headers. These headers start with a normal line of data and then are continued after a newline by indentation with a single space, which is stripped when computing the body.

Our header parsing for commits could misparse these because we use strings.Fields, which strips leading space. As a result, if the mergetag message contains a line starting with "tree " and then a non-hex character, we'd fail parsing since we'd attempt to parse it as a tree header. Let's fix this by using strings.Split instead, which doesn't strip space and therefore won't try to misparse this.

The fake commit is synthesized from be122abe4bcd6d39b37892daae28c8bf5e4030fc in the Linux kernel repository. Identifying information and most of the text are removed for privacy and license reasons.

@bk2204 bk2204 requested a review from a team September 28, 2022 15:51
Right now, the error message when trying to parse an invalid tree or
parent header is hard to pin down because it's simply a generic hex
string parsing error.  Make the error easier to discover in the future
by adding some descriptive text to indicate to the user what the problem
is.
Git contains some multiline headers, including `gpgsig`,
`gpgsig-sha256`, and `mergetag` headers.  These headers start with a
normal line of data and then are continued after a newline by
indentation with a single space, which is stripped when computing the
body.

Our header parsing for commits could misparse these because we use
`strings.Fields`, which strips leading space.  As a result, if the
mergetag message contains a line starting with "tree " and then a
non-hex character, we'd fail parsing since we'd attempt to parse it as a
tree header.  Let's fix this by using `strings.Split` instead, which
doesn't strip space and therefore won't try to misparse this.

The fake commit is synthesized from
be122abe4bcd6d39b37892daae28c8bf5e4030fc in the Linux kernel repository.
Identifying information and most of the text are removed for privacy and
license reasons.
Copy link
Member

@chrisd8088 chrisd8088 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find and great test, thanks!

@bk2204 bk2204 merged commit 30ed74d into git-lfs:main Sep 29, 2022
@bk2204 bk2204 deleted the mergetag-parsing branch September 29, 2022 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants