Use internal Url struct in favor of url::Url to minimize the url dep in payjoin#1377
Use internal Url struct in favor of url::Url to minimize the url dep in payjoin#1377benalleng wants to merge 3 commits intopayjoin:masterfrom
Conversation
Coverage Report for CI Build 24569919310Coverage increased (+0.4%) to 84.801%Details
Uncovered Changes
Coverage Regressions2 previously-covered lines in 2 files lost coverage.
Coverage Stats
💛 - Coveralls |
This comment was marked as resolved.
This comment was marked as resolved.
7c97b6c to
8e51345
Compare
DanGould
left a comment
There was a problem hiding this comment.
We can edit the spec to say "the host component of any URL in a BIP77 message MUST be an ASCII LDH (Letters, Digits, Hyphens) plus dots: [a-zA-Z0-9.-] hostname or an IPv4 address literal" to kill IDNA once and for all. For the full URL, we restrict the character set to RFC 3986 unreserved plus the structural delimiters we actually need (:/, ?, #, @). Reject any byte outside printable ASCII (0x21–0x7E) that isn't percent-encoded.
There's a specific URL fuzzer we might consider here: https://github.com/orangetw/Tiny-URL-Fuzzer https://github.com/orangetw/Tiny-URL-Fuzzer/blob/master/samples.txt could also test against
https://payjo.in\@evil.com/path
https://payjo.in%40evil.com/path
https://payjo.in#\@evil.com
https://payjo.in/../evil.com
https://payjo.in/%2e%2e/evil.com
https://evil.com:443@payjo.in/
https://payjo.in%00.evil.com/path
https://payjo.in:@evil.com/
not that these are reall attacks but just so we knwo we parse the same as url for stuff we could actually encounter. Low prio but I wanted to document.
| } | ||
|
|
||
| #[derive(Debug, Clone, PartialEq, Eq)] | ||
| #[allow(dead_code)] |
There was a problem hiding this comment.
note to self that this is removed by the time the PR's last commit comes around.
There was a problem hiding this comment.
Do we want support for ipv4 or ipv6 hosts in payjoin::Url if not we can just keep Host::Domain.
There was a problem hiding this comment.
I think for testing especially this is still useful. What do you think?
|
Tnull says "Def. no huge blocker if there is a path to dropping it!" but this seems pretty darn close so I'd like to close it out to keep the prs flowing |
652f4a7 to
4fca182
Compare
96f3353 to
b3bcf75
Compare
2441088 to
29b6648
Compare
xstoicunicornx
left a comment
There was a problem hiding this comment.
Have a bit of feedback.
One other thing, seems like we are replicating the url::form_urlencoded::parse method in two different places so maybe its worth creating a native implementation of this in url.rs?
| scheme.push(c); | ||
| } | ||
| ':' => break, | ||
| _ => return Err(ParseError::InvalidCharacter), |
There was a problem hiding this comment.
Maybe just use ParseError::InvalidScheme instead, as this is more consistent with how parse_port errors are handled
| let mut path = String::new(); | ||
| let mut query: Option<String> = None; | ||
| let mut fragment: Option<String> = None; | ||
|
|
||
| if let Some(frag_pos) = input.find('#') { | ||
| let before_fragment = &input[..frag_pos]; | ||
| fragment = Some(input[frag_pos + 1..].to_string()); | ||
|
|
||
| if let Some(q_pos) = before_fragment.find('?') { | ||
| path.push_str(&before_fragment[..q_pos]); | ||
| query = Some(before_fragment[q_pos + 1..].to_string()); | ||
| } else { | ||
| path.push_str(before_fragment); | ||
| } | ||
| } else if let Some(q_pos) = input.find('?') { | ||
| path.push_str(&input[..q_pos]); | ||
| query = Some(input[q_pos + 1..].to_string()); | ||
| } else { | ||
| path.push_str(input); | ||
| } |
There was a problem hiding this comment.
This is a bit confusing to follow why not just:
| let mut path = String::new(); | |
| let mut query: Option<String> = None; | |
| let mut fragment: Option<String> = None; | |
| if let Some(frag_pos) = input.find('#') { | |
| let before_fragment = &input[..frag_pos]; | |
| fragment = Some(input[frag_pos + 1..].to_string()); | |
| if let Some(q_pos) = before_fragment.find('?') { | |
| path.push_str(&before_fragment[..q_pos]); | |
| query = Some(before_fragment[q_pos + 1..].to_string()); | |
| } else { | |
| path.push_str(before_fragment); | |
| } | |
| } else if let Some(q_pos) = input.find('?') { | |
| path.push_str(&input[..q_pos]); | |
| query = Some(input[q_pos + 1..].to_string()); | |
| } else { | |
| path.push_str(input); | |
| } | |
| let (before_fragment, fragment) = match input.find('#') { | |
| Some(pos) => (&input[..pos], Some(input[pos + 1..].to_string())), | |
| None => (&input[..], None), | |
| }; | |
| let (path, query) = match before_fragment.find('?') { | |
| Some(pos) => | |
| (before_fragment[..pos].to_string(), Some(before_fragment[pos + 1..].to_string())), | |
| None => (before_fragment.to_string(), None), | |
| }; |
| Some(ref h) if h.is_empty() => return Err(ParseError::EmptyHost), | ||
| Some(Host::Domain(ref d)) | ||
| if !d.chars().all(|c| c.is_ascii_alphanumeric() || c == '-' || c == '.') => | ||
| return Err(ParseError::InvalidHost), |
There was a problem hiding this comment.
Wouldn't it make more sense for these parsing errors to be caught in parse_host (would also be more consistent with the other parsing functions)?
Also, if we are erroring for ParseError::EmptyHost why does the host need to be an Option<Host> rather than just Host? This validation seems to ensure that the host will always exist. Not using an Option<Host> would also eliminate the need for has_host.
There was a problem hiding this comment.
Good point, this simplifies the logic and any mutants created from host quite a bit
This commit implements the minimal native Url Struct and validation logic to be able to replace the Url dep from within payjoin excluding when utilizing any external Url interfaces like when using reqwest in the io feature. Co-authored-by: xstoicunicornx <xstoicunicornx@users.noreply.github.com>
This commit migrates the monorepo away from the external Url dep to use the new internal Url. Additionally due to the transition we need to add a dep for url encoding with `percent-encoding-rfc3986` which coincidentally get us inline with the bitcoin_uri crate.
Closes #1124
This adds the internal Url Struct to no longer depend on the url crate directly for it.
However we still have reqwest inside payjoin url from payjoin-directory and test-utils which pull in the url dep, but that is only available on the
iofeature.I have also added a fuzz target, which already helped me better shape the parse function better.
I have only ran the fuzzer for a limited amount of time however and some more cpu hours might be helpful
Claude really helped me get through the brunt of this.
You can test the lack of an accessible url dep not including the io feature with this command.
Pull Request Checklist
Please confirm the following before requesting review:
AI
in the body of this PR.