Open
Conversation
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is a rather big one. It fully implements URDNA2015, URDNA2012 and RDFC10 and has complete test-suite coverage on these algorithms (with the exception of the one on poisonous datasets, but more about that later). It 'fixes' #220
Some general remarks before I dive into the details:
canon.py. This made the code much much cleaner, but ironically didn't fix what I introduced it for: nquad serialization. The relevant methods are copied over from rdflib until I can turn them into PRs for rdflib.rdflib.Datasetand back. This should ensure backwards-compatibility on some methods such asjsonld.normalization()andURDNA2015.main().Overview of the changes:
rdflibdependency in all relevant config and code files.rdflibmostly changedutil.pyfrom_legacy_dataset()andto_legacy_dataset()to easily go from anrdflib.Datasetto an RDFJS-likedictwherever needed.tests/test_util.pyfor these functions. We might want to move more general-purpose methods to this file.canon.py:URDNA2015._canonicalize(self, dataset: Dataset)while handling input and output remained inURDNA2015.main(). The latter now does the nquads parsing instead ofJsonLdProcessor.normalize(), so all parsing and serialization is handled by the same class.URDNA2015._canonicalize(self, dataset: Dataset)accepts anrdflib.Datasetand returns a tuple withstranddict.URDNA2015 .main(self, dataset: str | dict | Dataset, options)now accepts ardflib.Datasetobject in addition to an nquadsstror the original RDFJS-likedict. It returnsstr: the serialized nquads result, ordict: the result as RDFJS-like dataset or the blank node identifier map when the new parameteroutputMapisTrue.URDNA2015.hash_algorithmso it easily configurable (required for RDFC1.0)permutations()function now usesitertools.permutationsinstead of a custom implementation._nq_rowand_quoteLiteral, which should eventually move to a fix for rdflib's nquads serializer.tests/runtests.pyIt's possible that we move this code out before ever merging, but at least it can be easily tested. That said, when importing this functionality as an external library, we include rdflib anyway. It therefore makes sense to continue replacing the RDFJS-like data structures elsewhere in the code, essentially making it fully rdflib compliant and dependent.
Since this involves RDFLib: @nicholascar, anyone who can maybe have a look at this? Also @davidlehn @BigBlueHat @anatoly-scherbakov