Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define Algebra.Definitions.RawMonoid.sum using Data.Vec.Vec #1886

Open
XiaohuWang0921 opened this issue Dec 12, 2022 · 9 comments
Open

Define Algebra.Definitions.RawMonoid.sum using Data.Vec.Vec #1886

XiaohuWang0921 opened this issue Dec 12, 2022 · 9 comments

Comments

@XiaohuWang0921
Copy link
Contributor

Is there a particular reason that the finite summation over a monoid is defined using Data.Vec.Functional.Vector, the functional definition for finite vectors, instead of the more natural choice of using the inductive definition from Data.Vec.Base? If not, I think it should be rewritten using the inductive Vec as the inductive definition allows pattern matching, which makes many properties of the finite summation easier to prove.

@gallais
Copy link
Member

gallais commented Dec 12, 2022

I'm guessing one of the motivations was: https://personal.cis.strath.ac.uk/james.wood.100/blog/html/VecMat.html#16679

@XiaohuWang0921
Copy link
Contributor Author

Fair enough, but consider a simple equation that is used almost every day:

$$\sum_{k=1}^{m+n}a_k=\sum_{k=1}^ma_k+\sum_{k=m+1}^{m+n}a_k$$

In Agda, it would be something like this (given the same imports and declarations as in Algebra.Properties.Monoid.Sum):

sum-++ :  {m n} (xs : Vector A m) (ys : Vector A n)  sum (xs ++ ys) ≈ sum xs + sum ys

I was surprised to find it absent in the stdlib, so I tried to prove it myself, and it took significantly more effort than it would have if we were to simply replace Vector with Data.Vec.Vec, and required adding the following lemmas to Data.Vec.Functional.Properties:

tail-++ :  {m n} (xs : Vector A (suc m)) (ys : Vector A n)  tail (xs ++ ys) ≗ tail xs ++ ys
foldr-cong :  f x {n} (xs ys : Vector A n)  xs ≗ ys  foldr f x xs ≡ foldr f x ys

whose corresponding statements for Data.Vec.Vec are too trivial to require separate proofs.
In my personal opinion such basic equalities like the one mentioned above should be given higher priority than formalising vector/matrix operations, as it would pop up more often also in areas other than linear algebra.

@Taneb
Copy link
Member

Taneb commented Dec 14, 2022

I kind of feel like should be moved from Algebra into the Data heirarchy as operations on the data type being summed over.

@jamesmckinna
Copy link
Contributor

jamesmckinna commented Dec 14, 2022

There seem to be many possible competing choices/forces at work here:

  • the sum-++ lemma you mention is available in Data.Vec.Properties; so one way to resolve your issue is to transport this property across the iso between Vec and Vector; the stdlib guidelines, as I understand them, emphasise that there should be multiple implementations of the same thing only with very good reason, especially so in the case of (computationally irrelevant) properties;

  • different implementation types are inevitable in the presence of computations (as described in @laMudri 's blog post) sufficiently sensitive to such choices; there is (likely to be) no 'one data type fits all' approach which will work well in all cases, so any appeal to, or claims about, any representation being more 'natural' (or otherwise), seem moot;

  • the above situation isn't especially helped by the (natural?) desire to program and prove to specific dependently-typed implementations (eg, as you observe, precisely because of the convenience of pattern matching), rather than supplying abstract interfaces and working with those, especially as Agda's support for such abstractions, or perhaps only stdlib developers' limitations in using them, leaves much to be desired; although one might perhaps see the essence of algebra, and developments in the Algebra.* hierarchy, as trying to codify abstract equational signatures which particular implementations might satisfy (in an implementation-independent way, ie. 'modulo isos');

  • as to where such computations live, or should live, there is from time to time lively discussion between stdlib contributors/maintainers as to the 'best' place for that: I've argued elsewhere that both sum, and its property, sum-++, do not even belong in the library at all (or at least, not in Data.List.Base/Data.List.Properties; this would remove Data.Nat.* from the imports, at least for the latter, which would be a huge saving, cf. my comment on Add Haskell's Foldable #1099 ), as they are instances of more general computations/properties based on foldr(eg, see PR sum as an accumulating fold; sum-++ by way of fold-++ #1712) ... (or at least, such specialised combinations of polymorphic vector operations with specific datatypes should be isolated in their own small modules); I happen to disagree with @Taneb about moving such things into Data.*, because from one/my perspective, these things typically correspond, by specialising the folds, to proving homomorphism properties between various instances of (additive) monoids, and such things don't belong in Data.*, but rather Algebra.*, imho (see issue Where should homomorphisms live in the stdlib hierarchy? #1871, Length is a monoid homomorphism #1863, List.length is a monoid homomorphism #1857, Data.Parity and the 2+_ pattern #1851 etc.)

But in the end, the stdlib in both its development history and its current state, represents a collective set of compromises and trade-offs, bearing the above forces in mind (along with individual stakeholders' commitments to minimalist/maximalist approaches to library design, in terms of module contents).

As a volunteer effort, we rely on contributions in the form of PRs, and the subsequent, often lengthy, review/revision cycle. With a major (v2.0) milestone and library release forthcoming ('soon'), I could imagine there being some pushback in the short term to changing design decisions about implementation types for various library functions, especially if there is a performance hit on eg Tactic invocations. But as @JacquesCarette has pointed out in other issues, unless we know for sure (esp. wrt timings, import dependencies, ...) about the impact of such changes, we remain in the realm of mere speculation.

@XiaohuWang0921
Copy link
Contributor Author

@jamesmckinna Thank you for your comment, it is very comprehensive and I haven't yet given thought to everything you've mentioned. Here is what I have thought about so far:

  • the sum-++ lemma you mention is available in Data.Vec.Properties; so one way to resolve your issue is to transport this property across the iso between Vec and Vector; the stdlib guidelines, as I understand them, emphasise that there should be multiple implementations of the same thing only with very good reason, especially so in the case of (computationally irrelevant) properties;

Actually it is not. sum-++ in Data.Vec.Properties refers to the sum defined in Data.Vec.Base, which only defines finite summations over natural numbers. I however need the corresponding statement for finite summations over an arbitrary monoid.

As for the question where computations should live, I believe the answer to that depends much on what kind of abstraction and generalisation we want to introduce in the stdlib. Take the Euclidean algorithm for example. Currently, it lives in Data.Nat.GCD under the Data.* hierarchy. However, one could argue that since this algorithm could be generalised to all Euclidean domains, it should rather be put under the Algebra.* hierarchy along with its properties, making most of Data.Nat.GCD as well as Data.Integer.GCD redundant. However, it is only possible once the developers of stdlib decide to actually introduce the concept of Euclidean domains.

@jamesmckinna
Copy link
Contributor

@XiaohuWang0921 thanks for your careful correction to my earlier message; my apologies for writing in haste without checking things carefully. For the general case of a Monoid, Matthew Daggitt's comments on my earlier PR #1712 may offer some helpful insight, but your general points:

  • what should be the appropriate container representation for summations;
  • where in the hierarchy should the computations and their properties live

are still very much live, and interesting, topics for discussion.

@JacquesCarette
Copy link
Contributor

Comments on this:

  • having a sum like that in Data.Vec.Base (and that's a bad place for it too) that is so restricted is a waste of stdlib space. Data.Vec.Base should define various folds and some higher-level spot in the library can decide to create a special case.
  • we ought to define more interfaces (here: variants on Foldable) that can be re-used
  • I agree that the Euclidean Algorithm should exist in Algebra.*. This should happen in a future refactoring once the needed pieces exist. That things are in Data.Nat.GCD right now is what happens when you do "from the ground up" evolution of a library. It's the best way to get a library populated, and an awful way to do long-term design.

For findability and usability by beginners, the specializations of the general algorithms should still exist and be exported - but be coded as specializations, not copies.

@jamesmckinna
Copy link
Contributor

jamesmckinna commented Jan 24, 2023

Re: foldr-cong in the above development:
this, too, is available (whether in the full generality you seek, I'm not sure, but I think so) under Data.Vec.Functional, but not perhaps in the 'obvious' place, namely here:
Data.Vec.Functional.Relation.Binary.Equality.Setoid since PR #1351
Edited: see also the discussion of the Data.Vec.Functional representation and its properties under issue #1731

@jamesmckinna
Copy link
Contributor

... and the preceding/associated/related issues PRs which have touched on 'algebraic' properties of foldr (on whatever structure): #1287, #1377, #1387, #1712, #1281, #1099, ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants