Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: resolve connection targets and permit connecting via public DID #2409

Merged

Conversation

dbluhm
Copy link
Contributor

@dbluhm dbluhm commented Aug 10, 2023

This PR builds on #2404 to adjust DID -> ConnectionTarget look up to use the DID Resolver interface and also to permit connecting via DID Exchange with public DIDs. This enables DIDComm over exclusively did:web and other methods (permits not rotating to a peer DID during the exchange) or just initializing a DID Exchange with a resolvable DID that supports DIDComm.

I believe this change is a significant improvement for consistently handling DIDs and DID Documents during connections as there is no longer a distinction between local/peer DIDs and public DIDs, at least in terms of determining how to look up the associated document.

This PR is currently in development. Once #2404 is in, I'll rebase this branch to clean up the history. There's a few more pieces I'm still working on as well:

  • Unit tests. I've tested these changes manually and it's currently working as expected but I'll be a good citizen and make sure my additions are covered.
  • Caching resolved DID Documents. This change relies on resolving DID Documents to determine connection targets. For exchanged (legacy) peer DIDs, this means it will look up the document in the wallet, which is more or less the same as it was before. However, if you're resolving public DIDs, we probably don't want to query a ledger every time we send a message, especially since DIDComm message exchanges tend to be burst-y. feat: add legacy peer did resolver #2404 includes caching for legacy peer DIDs; however, I'm not certain whether caching should be handled at the method resolver layer or at the global resolver layer. I'll put some more thought into this.

cc @swcurran @Jsyro I'll elaborate how I think this should affect the Peer DID work further in #2249

Summary as of 2023-08-21

Refactors: Connection Management

  • Moved all core connection management functions out of aries_cloudagent.protocols.connections.v1_0.manager into aries_cloudagent.connections.base_manager. This should have been done at the first introduction of the base manager. I did this now since I'd already invested the time into getting familiar with the connection code again and it resulted in a cleaner setup for testing.
  • Moved all testing for the same functions to appropriate location (aries_cloudagent.connections.tests.test_base_manager).
  • Refactored fetch_connection_targets, decreasing the cyclomatic complexity (as reported by mccabe) from 9 to 4. Also added comments in an attempt to make the various checks less cryptic.
  • Corrected type hints as appropriate for various methods.
  • Increased unit test code coverage for connection management. Mostly just covering some straggling edge cases to make sure branches were hit.
  • Dispatcher populates connection context using BaseConnectionManager instead of ConnectionManager.

Features: using Public DID during DID Exchange

It is now possible to use DIDComm over public DIDs without rotating out for "peer" DIDs during DID Exchange. There's a good case to be made for not doing this and always rotating out for a peer DID but some use cases can't be addressed in this way. Additionally, nothing in the DIDComm spec explicitly requires that DIDComm only takes place using "peer" DIDs.

Some implementation details:

  • On POST /didexchange/create-request (sending a DID Exchange request to a public DID), you can specify use_public_did. This has been possible for some time. What's new in this change is that a DID Doc attachment is not generated for the public DID and is therefore not included in the DID Exchange request. On receipt of such a request, ACA-Py will resolve the Document of the DID and extract information to store about the connection.
  • On POST /didexchange/{conn-id}/accept-request you can now specify use_public_did, just like the parameter of the same name for create-request. This will have a similar effect but influences the DID Exchange response instead of the request.
  • Public DIDs were previously sent unqualified; if the current public DID for the ACA-Py instance is unqualified, it will now be fully qualified in DID Exchange requests and responses.
  • Connecting to yourself through your own public DID using your own public DID is explicitly forbidden to avoid nasty edge cases. This scenario has questionable use anyways.
  • Attempting to connect to a public DID using your public DID when a connection already exists for the same DIDs is explicitly forbidden. The existing connection should be used instead.
  • Added BaseConnectionManager.resolve_connection_targets. This replaces the original mechanism of looking up stored DID Documents enabling the Documents to be resolved as appropriate for a given DID. This uses the DID Resolver interface to perform Document resolution. Stored DID Documents are retrieved from the wallet through the DID Resolver interface and the LegacyPeerDIDResolver added in feat: add legacy peer did resolver #2404.
  • Added BaseConnectionManager.record_keys_for_public_did. In order to determine the connection for an inbound message, you must look up the DID associated with the sender keys. When receiving a DID Document as an attachment, this occurs on storage of the document. Since we don't store DID Documents for public DIDs but rather just resolve them on demand (with caching, see below), record_keys_for_public_did is used on receipt of a public DID in a DID Exchange request or response to resolve and record the keys associated with the DID for use in this lookup process later. Changes to the keys found in a DID Document that take place after the DID Exchange are NOT being recorded right now, even though these changes would be visible when we later resolve the Document again. This will have to be covered in future work.
  • Made BaseConnectionManager.add_key_for_did idempotent. It's possible to form connections with the same public DID using peer DIDs on our end (suppose you wanted to maintain separate personas with the same entity). This change keeps the key-to-DID look up from containing duplicate records.

Features: Resolved DID Document Caching

With the DID Resolver becoming a more critical component in messaging, it is necessary to make sure we're retrieving DID Documents efficiently. Especially given that DIDComm message exchanges tend to occur in bursts, Documents don't need to be refreshed with each new outbound message to a given DID. This PR adds caching to the DID Resolver.

Implementation Details:

  • The caching was added into the BaseResolver ABC. This is the class implemented by resolvers for a given DID Method. Instances of this class are registered with the "global" DID Resolver which will delegate resolution of a DID to method resolvers based on the method of the DID.
  • With caching implemented at this layer, method resolvers have the ability to tune cache TTL for the documents as appropriate for the DID Method. An example of this can be seen in the LegacyPeerDIDResolver which overrides the caching behavior so that it can store some additional metadata in the cache.
  • The default TTL is 3600 seconds or one hour.

@dbluhm
Copy link
Contributor Author

dbluhm commented Aug 15, 2023

Thoughts on Caching:

Caching at the global resolver layer would mean that we don't have to worry about it at the resolver layer but prevents the sub resolver from tuning caching to its specific needs.

Caching at the method resolver layer means that the resolver can tune to its needs but each resolver would have to worry about caching. This could also introduce some inefficiencies.

@swcurran
Copy link
Contributor

Would it be worth the lower level just providing a default TTL parameter that the global resolver could use? And perhaps a cache invalidation mechanism that the registar for a did method could trigger?

@dbluhm
Copy link
Contributor Author

dbluhm commented Aug 16, 2023

Tried a couple of things and landed on implementing the caching in the base resolver in a way that can be overridden by the method specific resolvers.

@dbluhm dbluhm force-pushed the feature/doc-resolution-consistency branch 2 times, most recently from 16ce100 to 7655f9e Compare August 18, 2023 16:26
@dbluhm
Copy link
Contributor Author

dbluhm commented Aug 18, 2023

A big (by line number) change I made that merits comment: I moved all code that was not specific to the connections protocol manager out of that manager and into the base manager. I think this should have been done at the introduction of the base manager right around when we added support for did exchange. I believe there are lingering references to using the Connections protocol manager directly rather than using the base manager. This isn't a problem but I would like to clean this up in the future.

The main reason I did this as part of these changes is because a significant portion of my changes in this PR were to the base manager and it felt wrong to write tests for that in the connections protocol manager's tests (which is where all the tests used to live for the base manager until these changes).

@dbluhm dbluhm marked this pull request as ready for review August 18, 2023 20:28
@dbluhm
Copy link
Contributor Author

dbluhm commented Aug 18, 2023

@swcurran @Jsyro This is now ready for review

@swcurran
Copy link
Contributor

@usingtechnology — you too! :-)

@dbluhm
Copy link
Contributor Author

dbluhm commented Aug 21, 2023

Temporarily moving this back to draft (it's still ready to receive feedback though!) so I can address a scenario currently resulting in errors:

sequenceDiagram
    participant alice as Alice
    participant bob as Bob

    bob ->> bob: has public DID
    alt Request through public DID
    alice ->> alice: has public DID
    else Request through invitation
    alice ->> bob: OOB invitation
    end

    loop Run twice
    bob ->> alice: request with public DID
    alice ->> bob: response with (legacy) peer DID
    bob ->> alice: complete
    end

    bob ->> alice: trust-ping
Loading

If Bob connects to Alice through a public DID twice and Alice responds with a "peer" DID both times, Alice will call record_keys_for_public_did twice for Bob's DID. This causes the following error when bob sends a message to alice:

alice_1             | Traceback (most recent call last):
alice_1             |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step
alice_1             |     result = coro.send(None)
alice_1             |   File "/aries_cloudagent/core/dispatcher.py", line 238, in handle_message
alice_1             |     connection = await connection_mgr.find_inbound_connection(
alice_1             |   File "/aries_cloudagent/connections/base_manager.py", line 766, in find_inbound_connection
alice_1             |     connection = await self.resolve_inbound_connection(receipt)
alice_1             |   File "/aries_cloudagent/connections/base_manager.py", line 797, in resolve_inbound_connection
alice_1             |     receipt.sender_did = await self.find_did_for_key(receipt.sender_verkey)
alice_1             |   File "/aries_cloudagent/connections/base_manager.py", line 235, in find_did_for_key
alice_1             |     record = await storage.find_record(self.RECORD_TYPE_DID_KEY, {"key": key})
alice_1             |   File "/aries_cloudagent/storage/askar.py", line 167, in find_record
alice_1             |     raise StorageDuplicateError("Duplicate records found")
alice_1             | aries_cloudagent.storage.error.StorageDuplicateError: Duplicate records found

This type of a scenario, if using an out of band invitation, can optionally result in a connection reuse by Alice. But even if we don't reuse the connection, we can avoid this issue by only recording the keys for the DID once. The connection context of the message should be correctly assigned on receipt of the message based on the combination of recipient and sender.

@dbluhm dbluhm marked this pull request as draft August 21, 2023 16:41
@swcurran
Copy link
Contributor

@usingtechnology @Jsyro — would appreciate you reviewing this in preparation for releasing 0.10.0. Let’s plan on discussing this at the ACA-Py call tomorrow. @dbluhm could you please come prepared to go over this one — what it enables and what it entails, including the caching ramifications?

Thanks!

@usingtechnology
Copy link
Contributor

I'll take a look today so I am prepared too.

self._logger.debug("Received DID Doc attachment in request")
async with self.profile.session() as session:
wallet = session.inject(BaseWallet)
conn_did_doc = await self.verify_diddoc(wallet, request.did_doc_attach)
Copy link
Contributor

@Jsyro Jsyro Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the DIDDoc deserialize throws an error? the DIDDoc class looks for publickey instead of verificationMethods.

https://github.com/dbluhm/aries-cloudagent-python/blob/4de1a12586456feef1fed231610f1391ba96bf99/aries_cloudagent/connections/models/diddoc/diddoc.py#L216

Do we need to check the context attribute of the diddoc before deserializing?

Copy link
Contributor

@Jsyro Jsyro Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment above may be irrelevant if Is this verifying a did_doc_attach from the other agent instead of one resolved from a public ledger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unchanged from the original handling of DID Document attachments. I would consider any changes here to be out of scope for this PR. For changes to this behavior beyond this PR, I think we effectively side step it altogether by using Peer DIDs (we won't include an attached DID Doc for peer DIDs).

conn_rec.my_did = my_info.did
did = my_info.did
if not did.startswith("did:"):
did = f"did:sov:{did}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do public unqualified did's exist? is this how we should handle them?

Copy link
Contributor Author

@dbluhm dbluhm Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Indy style DIDs in ACA-Py's wallet are stored unqualified. This includes public Indy DIDs so this check and prepending did:sov: is necessary for the DID that we send in the request/response to be fully qualified.

When we directly support did:indy DIDs, they should be properly stored with their method identifier.

@dbluhm
Copy link
Contributor Author

dbluhm commented Aug 21, 2023

I've added a more detailed summary of changes to the description of this PR in preparation for ACA-PUG tomorrow.

@dbluhm dbluhm marked this pull request as ready for review August 21, 2023 19:12
@usingtechnology
Copy link
Contributor

I need to leave a comment here that this documentation you (@dbluhm) provided is above and beyond and to which we should all aspire.

My knowledge of the DID protocols is lacking, so I won't notice issues or have questions that @Jsyro may raise, but the changes look good to me.

Leaving one question in my review.

Copy link
Contributor

@usingtechnology usingtechnology left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Copy link
Contributor

@Jsyro Jsyro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVED. LGTM!

dbluhm added 15 commits August 23, 2023 16:14
This is required in order to correlate sender verkeys back to a DID and
connection on receipt of a message.

Signed-off-by: Daniel Bluhm <[email protected]>
These cases break assumptions about uniqueness of connections.

Signed-off-by: Daniel Bluhm <[email protected]>
Removing everything from the connections v1 manager that wasn't specific
to the connections protocol itself.

Signed-off-by: Daniel Bluhm <[email protected]>
@dbluhm dbluhm force-pushed the feature/doc-resolution-consistency branch from 2de2212 to 7f31d19 Compare August 23, 2023 20:14
@dbluhm dbluhm enabled auto-merge August 23, 2023 20:14
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants