Implement RDFC-1.0 by mielvds · Pull Request #245 · digitalbazaar/pyld

mielvds · 2026-02-25T09:06:22Z

This PR is a rather big one. It fully implements URDNA2015, URDNA2012 and RDFC10 and has complete test-suite coverage on these algorithms (with the exception of the one on poisonous datasets, but more about that later). It 'fixes' #220

Some general remarks before I dive into the details:

I went back and forth on how approach this: move the canon stuff to a new repo first, or fix the problems first. I sided with the latter as the test harness is already in place and the code is too tangled to not break everything.
This introduces rdflib into PyLD, but limited to the normalization/canonicalization canon.py. This made the code much much cleaner, but ironically didn't fix what I introduced it for: nquad serialization. The relevant methods are copied over from rdflib until I can turn them into PRs for rdflib.
I added some functions to transform the legacy RDF.JS dataset structure in an rdflib.Dataset and back. This should ensure backwards-compatibility on some methods such as jsonld.normalization() and URDNA2015.main().
While switching to rdflib introduces significant changes, I tried to change as little as possible otherwise. Optimizing the algorithm implementations can be done later.

Overview of the changes:

added rdflib dependency in all relevant config and code files.
the change to rdflib mostly changed
- RDF term type checking (e.g., checking is something is a bnode)
- looping over triples/quads
- serialization
- constructing RDF terms (custom deepcopy is no longer needed)
util.py
- added the functions from_legacy_dataset() and to_legacy_dataset() to easily go from an rdflib.Dataset to an RDFJS-like dict wherever needed.
- Also added unittests in tests/test_util.py for these functions. We might want to move more general-purpose methods to this file.
canon.py:
- the main logic was moved to URDNA2015._canonicalize(self, dataset: Dataset) while handling input and output remained in URDNA2015.main(). The latter now does the nquads parsing instead of JsonLdProcessor.normalize(), so all parsing and serialization is handled by the same class.
- the method URDNA2015._canonicalize(self, dataset: Dataset) accepts an rdflib.Dataset and returns a tuple with
  - the canonicalized result as a nquads str and
  - the blank node identifier map as dict.
- The method URDNA2015 .main(self, dataset: str | dict | Dataset, options) now accepts a rdflib.Dataset object in addition to an nquads str or the original RDFJS-likedict. It returns
  - a str: the serialized nquads result, or
  - a dict: the result as RDFJS-like dataset or the blank node identifier map when the new parameter outputMap is True.
- the hashing algorithm is now an class attribute URDNA2015.hash_algorithm so it easily configurable (required for RDFC1.0)
- the permutations() function now uses itertools.permutations instead of a custom implementation.
- replacements for rdflib's _nq_row and _quoteLiteral, which should eventually move to a fix for rdflib's nquads serializer.
tests/runtests.py
- (re-)enabled all skipped URDNA2015, URDNA2012
- Added the RDFC10 tests
- added support for testing blank-node identifier maps. If the result of a test is a dict and the expected value is a string, the expected value is now parsed as JSON.
- added support for testing with different hashing algorithms
- !! I did not include test 74c on dataset poisoning because I'm not sure how to handle that yet. We need to discuss possible guardrails first before continuing on this.

It's possible that we move this code out before ever merging, but at least it can be easily tested. That said, when importing this functionality as an external library, we include rdflib anyway. It therefore makes sense to continue replacing the RDFJS-like data structures elsewhere in the code, essentially making it fully rdflib compliant and dependent.

Since this involves RDFLib: @nicholascar, anyone who can maybe have a look at this? Also @davidlehn @BigBlueHat @anatoly-scherbakov

github-actions · 2026-02-25T09:06:59Z

Coverage Report

File	Stmts	Miss	Cover	Missing
lib/pyld
canon.py	278	49	82%	34, 44, 48–53, 67, 77–78, 507, 529, 589–594, 599–608, 612–636, 657–658
context_resolver.py	102	8	92%	70, 129, 143, 189, 195–197, 221
iri_resolver.py	141	3	98%	307–308, 318
jsonld.py	2436	176	93%	274–278, 359, 396, 401, 411, 428–443, 479–480, 522, 530, 550, 558–559, 667–672, 752–753, 768–769, 829–834, 859–860, 875–876, 886–887, 912–913, 959, 968–974, 981, 991, 1023, 1032, 1039, 1104, 1127, 1149–1151, 1161–1164, 1179, 1202, 1214–1217, 1309, 1321–1324, 1344, 1373, 1406, 1410, 1429–1430, 1447, 1490, 1508–1511, 1520, 1565, 1668–1675, 1702–1704, 1709, 1730–1733, 1909, 1930, 2358, 2367–2374, 2466, 2493, 2516, 2525, 3041, 3094, 3097, 3160, 3245, 3279, 3287, 3553, 3558, 3560, 3609, 3747–3751, 3817, 3854, 3917, 4137–4138, 4152, 4440, 4572, 4606, 4635–4638, 4664–4665, 4848, 4912, 4939, 5013, 5015, 5033, 5131, 5180, 5412, 5414, 5453, 5455, 5464, 5603, 5729, 5816, 5936–5950, 6009–6015, 6082, 6295, 6301–6305, 6344, 6347, 6437–6438, 6447
nquads.py	119	4	97%	79, 119, 243–244
util.py	58	2	97%	41, 99
lib/pyld/documentloader
aiohttp.py	65	22	66%	28–38, 69, 75, 87, 99–107, 118–121, 138, 153–155
requests.py	41	5	88%	55, 69, 81–89
TOTAL	3275	269	92%

Tests	Skipped	Failures	Errors	Time
1669	57 💤	0 ❌	0 🔥	11.249s ⏱️

mielvds added 2 commits April 14, 2026 09:19

Add stub for RDFC-1.0 and prepare tests.

c526c16

Add rdf-canon to submodules

e663b59

mielvds force-pushed the 220-RDFC10 branch from 5737882 to e663b59 Compare April 14, 2026 07:19

mielvds added 17 commits April 15, 2026 16:17

Add rdflib to dependencies

dcf1e08

Re-enable canon tests

03bb3d0

Introduce rdflib into canon.py and replace internal model.

93dc4b4

Don't normalize literals

accabce

Fixes positions and does small touchups

d61d3d4

Replace permutations with stdlib

eaf319c

Fix identifier ordering

82e8ae3

Switch from NT to NQ serialization

d61cdf1

Remove legacy nquads parsing dependency by moving parsing to canon.py

c7a2ea5

Move triple data structure convert functions to util.py

aaf44ed

Cleanup of old canon code

ca6d162

Add tests for conversion methods in util.py and do fixes

71594c9

Make NQ serialization part of class & add override for RDFC1.0

d1c41be

Fix RDFC1.0 literal encoding

67cd2ab

Add configurable hashAlgorithm

34bef4c

Add option to return the bnode map.

e72522e

Rename _main to _canonicalize, add docstring and move function

f13a99c

mielvds marked this pull request as ready for review April 23, 2026 08:13

Minimize change

a437bc2

mielvds mentioned this pull request Apr 23, 2026

Use pyld as the json-ld parser for rdflib RDFLib/rdflib#2308

Open

3 tasks

mielvds requested review from BigBlueHat, anatoly-scherbakov and davidlehn April 23, 2026 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement RDFC-1.0#245

Implement RDFC-1.0#245
mielvds wants to merge 20 commits intomasterfrom
220-RDFC10

mielvds commented Feb 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mielvds commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mielvds commented Feb 25, 2026 •

edited

Loading

github-actions Bot commented Feb 25, 2026 •

edited

Loading