Skip to content
This repository was archived by the owner on Nov 22, 2023. It is now read-only.

add Polygons + do long standing refactors #166

Merged
merged 11 commits into from
Mar 2, 2020
Merged

add Polygons + do long standing refactors #166

merged 11 commits into from
Mar 2, 2020

Conversation

SimonDanisch
Copy link
Member

No description provided.

@mkborregaard
Copy link
Contributor

mkborregaard commented Feb 3, 2019

I see you’ve merged and refactored.
Questions:

  1. does a polygon need to know if it’s a hole? I put that in MultiPolygon instead, because a polygon is just a polygon, it’s only a hole if it’s a hole in something. That something being the nonhole polygons in a MultiPolygon, given that we don't have the concept of "parent" polygons. I don't see it makes sense for a Polygon to be a hole on it's own.
  2. in for isinside? That would prevent us from exporting the function. And if we make it a method of Base.in that is an ambiguous use of the function I'd say. You might intuitively expect that to mean in(pt, vertices(polygon)).

Plan to factor out colortypes too?

@mkborregaard
Copy link
Contributor

Also, it looks good, nice work :-)

@SimonDanisch
Copy link
Member Author

in for isinside?

I just really love the syntax point in geometry, and I don't think it's dangerously ambigious, as long as we don't subtype it as AbstractVector. It seems intuitive to me, that one would do point in vertices(poly) to check if the point is actually part of the vertices.

does a polygon need to know if it’s a hole?

I guess not, I was just still inspired by the StructArray method and thought that would be the clear path forward. But actually, best argument against it would be, that one has a non-hole polygon, and tries to insert it into a multipolygon as a hole - that'd be pretty inconvenient! I'll change it

@visr
Copy link
Member

visr commented Feb 3, 2019

does a polygon need to know if it’s a hole?

A case for this is because the simple feature standard defines polygons to support holes, taken from https://r-spatial.github.io/sf/articles/sf1.html

POLYGON geometry with a positive area (two-dimensional); sequence of points form a closed, non-self intersecting ring; the first ring denotes the exterior ring, zero or more subsequent rings denote holes in this exterior ring

So if we want to read simple features polygons from GeoJSON.jl or GDAL.jl, we cannot convert it to GeometryTypes polygons if there are holes.

@mkborregaard
Copy link
Contributor

mkborregaard commented Feb 3, 2019

No the idea would be to convert it to a MultiPolygon, that's what that type is for. It allows multipart polygons at the same time. Would that be a problem?
I even think both MultiPolygon and Polygon could inherit from AbstractPolygon and have the same methods defined.

@mkborregaard
Copy link
Contributor

mkborregaard commented Feb 3, 2019

I just really love the syntax

Fair enough, it is nice. point inside polygon would be nice too though.

we don't subtype it as AbstractVector

With "it" you mean the polygon? Because AbstractPoint <: StaticVector right?

@mkborregaard
Copy link
Contributor

mkborregaard commented Feb 3, 2019

I do see the point in adhering to an ISO standard if it doesn't come at a cost in other contexts though. @visr would it be possible to implement geometries in accordance with the simple features standard in a way that is consistent with the GeometryTypes classes, and which relies could take advantage of the work you've already done in the Geo packages?

@visr
Copy link
Member

visr commented Feb 3, 2019

Yeah I'm not sure about it. Just wanted to point out that if we deviate from it here already, it will make interoperability more difficult. I don't think we can fix it by using multipolygons or this, since that is just a vector of polyons as a single feature.

If we can easily make sure this mostly works with simple features it's worth keeping an eye on that.

cc @yeesian

@mkborregaard
Copy link
Contributor

mkborregaard commented Feb 3, 2019

But FWIW I don't think we need to follow the same implementation details, as long as the types circumscribe the same information. Remember that SF had to be expressible in a very non-expressive language such as R. I mean wouldn't it be perfectly consistent with

struct ComplexPolygon <: AbstractPolygon
   poly::Polygon
   holes::Vector{Polygon}
end

struct MultiPolygon{T<:Union{ComplexPolygon, Polygon}} <: AbstractPolygon
  polys::Vector{T}
  bbox::Rect
end

etc


struct Polygon{T, A <: AbstractVector{<: AbstractPoint{2, T}}}
points::A
boundingbox::HyperRectangle{2, T}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is boundingbox required inside the definition of the Point? Can we make the case for applications that deems it expensive (and/or irrelevant) to compute, that don't wish to force it at construction? I think the LightGraphs approach will be to say that it should be metadata that's maintained outside of the type.

Same thing for isconvex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point (I think LightGraphs design philosophy is a bit too extreme, but that's beside the point). It is really a cache - it makes many operations like intersects potentially a lot faster, at the price of some time at construction. A third approach might be to have boundingbox be a Union{Rect, Nothing} and calculate it the first time boundingbox is called on the polygon?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is really a cache - it makes many operations like intersects potentially a lot faster, at the price of some time at construction.

It's why GEOS has normal geometries, and prepared geometries

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's a very C++-like way of dealing with that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@andyferris andyferris Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda agree with @yeesian. I generally prefer minimalism in the basic type definitions, and implement a generic way of appending metadata like this.

A more Julian way would be e.g. a BoundedGeometry that contains the AbstractGeometry and a boundingbox. Often you would compute/store this data as part of a spatial index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - the only worry I have is to make the type hierarchy too complex too early when what we really want is composability.


struct MultiPolygon{T, A <: AbstractVector{<: Polygon{T}}}
polygons::A
boundingbox::HyperRectangle{2, T}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is boundingbox required here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this and I came to the conclusion, that we should strictly get rid of any additional information and just figure out a nice infrastructure to pass metadata to the simple types!
Otherwise it's too hard to draw a line, and type will never be "complete", since there will always be new use cases that require different metadata ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then the possibility of "prepared types" that could be made by the user or by batching functions I guess, just like @yeesian suggested?. I can see the logic and since I was ever the only one really wanting to store the extra metadata anyway let's go with that.

@yeesian
Copy link

yeesian commented Feb 3, 2019

Thanks for pushing on this Simon and Michael!

does a polygon need to know if it’s a hole?

I guess not, I was just still inspired by the StructArray method and thought that would be the clear path forward. But actually, best argument against it would be, that one has a non-hole polygon, and tries to insert it into a multipolygon as a hole - that'd be pretty inconvenient! I'll change it

Yeah it feels more natural to me for the MultiPolygon to know whether a given Polygon is a hole or not, than for a Polygon to know whether or not it's the hole of a MultiPolygon.

@yeesian
Copy link

yeesian commented Feb 3, 2019

But FWIW I don't think we need to follow the same implementation details, as long as the types circumscribe the same information. [...] I mean wouldn't it be perfectly consistent with

struct ComplexPolygon <: AbstractPolygon
   poly::Polygon
   holes::Vector{Polygon}
end

struct MultiPolygon{T<:Union{ComplexPolygon, Polygon}} <: AbstractPolygon
  polys::Vector{T}
  bbox::Rect
end

etc

Why not go one step further and just say that Polygons have to implement a holes() method, like

struct MultiPolygon{T, A <: AbstractVector{<: Polygon{T}}}
   polygons::A
   boundingbox::HyperRectangle{2, T}
 end

holes(mp::MultiPolygon) = (p for p in mp.polygons if ishole(p))

struct ComplexPolygon{A <: AbstractVector{<: Polygon{T}}}
   poly::Polygon
   holes::A
end

holes(cp::ComplexPolygon) = cp.holes

Then people can define whatever Polygons they want, and you can define in() as:

function in(point::AbstractPoint, polygons)
    for polygon in holes(polygons)
        point in polygon && return false
    end
    # [...]
end

Btw I'm not sure the current implementation of in() for multipolygons is entirely correct. You might want to be careful in testing it first.

@yeesian
Copy link

yeesian commented Feb 3, 2019

Btw it's not a deal-breaker, but it seems to me the current implementation of Polygon can't seem to decide whether it wants to be a Shapefile.Polygon or GeoJSON.Polygon or LinearRing or its own thing. I'll elaborate later when I find the time.

Update: I'm going to postpone the discussion on this until after we agree upon a common geometry model in #166 (comment), since I imagine we want to see something like

Polygon <: ... <: AbstractGeometry

at some point.

@mkborregaard
Copy link
Contributor

It wants to be all of these IMHO.

@mkborregaard
Copy link
Contributor

Why not go one step further and just say that Polygons have to implement a holes() method,

makes sense

@SimonDanisch
Copy link
Member Author

A few refactoring ideas:

  • deprecate HyperCube, since it's basically a HyperRectangle, but with less functionality, and I guess nobody uses it
  • move out dinstancefields.jl
  • move out algorithms.jl (keep normals) - could also just get moved into meshes.jl
  • move all methods into e.g. hyperrectangle.jl instead of having them spread into small files like setops.jl

@andyferris
Copy link

Cool. I think I agree with a lot of what @yeesian has written already. I see this takes the Shapefile approach - the "multipolygon" we've been using is a collection of complex polygons.

(As a side note, this makes me feel that we should really share the code we've written...)

@mkborregaard
Copy link
Contributor

The end goal here is to have a uniform and lightweight representation for geometries throughout the ecosystem. Compasability is a goal, and it would be nice to do as much as possible with concrete types as 1) they shouldn't be too controversial for this case, 2) it's more robust and 3) it will avoid copying/conversions.
So for sure it should be at least immediately compatible with Shapefiles, though not chained to that if that causes problems somewhere else.

we should really share the code we've written...

Now would seem like a good time to do it, I do agree :-)

refactoring ideas:

I don't care so much but my view is that most operations could stay in this package in a separate files, since they rely on the types here and don't involve further dependencies. Also the names shouldn't conflict with names elsewhere. But I'm curious what it would take for @andyferris to think of this as basic enough?

@yeesian
Copy link

yeesian commented Feb 5, 2019

I'd like for us to move towards a single model of geometries that we can all work with [1]. It doesn't have to be comprehensive, but it has to be agreed upon. I'll like for such a definition to be sufficiently straightforward that something like Polygon <: ... <: AbstractGeometry is made possible.

We can add more methods to it as we go, but I want to be prudent/conservative in terms of what is required (versus nice-to-have). I'll be happy to flesh out a more detailed description of such a geometry type if we are all in agreement.

Without further ado: may I propose the following starting point for all of us?

We start with a single AbstractGeometry type. Any subtype of an AbstractGeometry is required to implement

eltype(::AbstractGeometry)
ndims(::AbstractGeometry)
copy(::AbstractGeometry)

I have some remarks (based on my reading of the current GeometryTypes codebase):

  1. For some classes of geometries (e.g. circles), it is unclear what vertices() and nvertices() should mean, so I left them out of the basic definition. (Even in those cases where it is appropriate, I would prefer something like ncomponents(geometry), component(geometry, i) and components(geometry) instead.) But someone can go ahead and implement a subtype like AbstractIterableGeometry <: AbstractGeometry for it.

  2. In addition to iterable geometries (c.f. point 1), there can also be a AbstractMutableIterableGeometry <: AbstractIterableGeometry which requires methods like push!(geometry, point) and deleteat!(geometry, i).

  3. Someone can go ahead and implement a parametric subtype that looks like e.g. AbstractParametricGeometry{T,N} <: AbstractGeometry. But it is not required for AbstractGeometry. I am not comfortable about committing to a parametric abstract type too early in case it makes it too onerous for wrappers of opaque C objects.

@SimonDanisch @mkborregaard @andyferris @visr (and anyone else following this thread with a vested interest in its outcome, thumbs up to this comment if it does not pose a problem for your model of geometries. Otherwise, provide a description of a case where it might be problematic.)

[1]: My personal preference is for us to move towards a trait-like interface of working with geometries. But my understanding from various discussions is that there is still reluctance about it, and greater enthusiasm for building a unified type hierarchy of geometries.


All of the discussion above is based on my reading of AbstractGeometry, GeometryPrimitive , and AbstractFlexibleGeometry. Since GeometryPrimitive is a subtype of AbstractGeometry, I'm going to leave it out of the rest of the discussion in this comment.

For AbstractGeometry, I observe the following methods:

For AbstractFlexibleGeometry, I observe the following methods:

My guess from the docstring of AbstractFlexibleGeometry{G} and implementations of current subtypes of AbstractFlexibleGeometry is that

AbstractFlexibleGeometry{G} ≈ AbstractVector{G} where G <: AbstractGeometry

Therefore, the current type hierarchy is meant to be something like

AbstractGeometry
AbstractImmutableGeometry <: AbstractGeometry
AbstractMutableGeometry <: AbstractGeometry

with the following set of methods:

eltype(::AbstractGeometry)
ndims(::AbstractGeometry)
vertices(::AbstractGeometry)
nvertices(::AbstractGeometry)
copy(::AbstractGeometry)

push!(::AbstractMutableGeometry, point)
deleteat!(::AbstractMutableGeometry, i)

Is that a reasonable reading? What might I be missing?

@visr
Copy link
Member

visr commented Feb 5, 2019

Thanks for writing it up @yeesian! Regarding:

[1]: My personal preference is for us to move towards a trait-like interface of working with geometries. But my understanding from various discussions is that there is still reluctance about it, and greater enthusiasm for building a unified type hierarchy of geometries.

It might be good to list some pros and cons of traits vs type hierarchies for this purpose. I'm more familiar with type hierarchies than traits, so that's what I mainly spoke about. But the success of Tables.jl in providing a common interface to many different table-like types might be transferred to geometries?

@mkborregaard
Copy link
Contributor

mkborregaard commented Feb 6, 2019

So the way I see it there are three possible levels of integration: shared concrete types, abstract types with an informal interface, and traits.

The advantage with concrete types is simplicity the way I see it. Everybody writes functions for the same types, it's easy to reason about what a type contains in a function, and it's thus much easier to write code that is optimised for performance. The issue with abstractions is that they may hide a non-performant implementation, which is exactly why specialized methods throughout the ecosystem tend to be faster. Another advantage is that it avoids copying because you never have to convert types. Another advantage is that if using the same basic concrete types becomes widespread throughout the ecosystem, you will have packages working together without ever having to align anything, or even plan for it. For instance if all plotting packages used these basic types it would be possible to plot shapefiles with any package without needing recipes.

The big disadvantage of course is that you have to agree on the type implementation, so it must be possible to define an implementation that is actually ideal for all purposes; also refactors can be much more complicated if you don't make sure to still code mainly with abstractions such as getter functions.

The advantage of abstract interfaces is that it allows you to have different types and still code to them under the same logic. This is extremely useful if you need to have different types. While abstractions are a lot easier to agree on and implement throughout the ecosystem, it also comes at the cost of unclarity. Say I load a shapefile with Shapefiles which is just julia Vectors of Points, and one with GDAL which wraps a pointer to a GDAL object, then uses LibgGEOS to intersect them - what type of Polygon should I then have? Do I end up with a mix of pure-Julia objects and C pointers etc in my workspace? And how many implicit copies happened by conversion in the process? (Let me know if I misunderstand any of the relationships here).

To be clear what I suggest is having both - i.e. to have an abstract interface, but also attempt to use the same concrete types where possible.

The advantage of traits is that it gives you many of the same advantages as an abstract interface but 1) it allows you to add traits to a type after it has been created (you can't really add a non-package-specific trait to a type created in a different package though, that would be type piracy), and it does not interfere with the existing type hierarchy, i.e. it allows you to have a semblance of multiple inheritance. It's also less simple, and IMHO clunkier to program with than an abstract interface.

Traits are great exactly for something like Tables, because it is an explicit design goal of the Tables ecosystem that there should be many types of Table. Some types are faster for row access, some for columns access, some for big persistent data sets in many different files, some for type stability etc etc. The idea is that the user should choose exactly the type of Table that is best for his/her use case, then be able to apply all the same operations on it. To my mind though that's a very different situation from the Geo case, where we need coherence more than pluralism imho.

I do have a question about design though, that relates to Geos and GDAL etc., given that they are the core of the geographical ecosystem in many languages. Are the C++ objects they use the same, or interchangeable? Or do you need converting?
So given that there is already a huge C++-based geography ecosystem, I see the point of letting that do all the work, and having Julia types simply wrap pointers that they pass around, then applying all functions as translations to the C++ interface and passing to the libraries. In this way Julia is used as a scripting language in exactly the same way as Python or R, and behaving the same way. In principle the implementation and experience could be the same, or it could be different if you guys have ideas that are better than the python developers out there (I bet you do). It would be a gated community, e.g. you convert everything to C++ pointers at the door then keep passing those objects around.

That's fine - julia is perfectly great as a scripting language. But it also has more powerful aspects - so then the question is - is there also space for a (simpler) Julia representation, , with julia types and deep integration with the rest of the ecosystem? I think so, which would be what we would try to build here. The ideal would be if these julia types could use geos and gdal natively, and I think I understood from Simon that it might be possible to define the julia types so they end up with the same memory representation as C types, but it's unclear to me how feasible this would be.

So maybe the most feasible other possibility is to have one ecosystem based on concrete Julia types, another (like what you already have) wrapping the same C++ types (if all the c++ objects are compatible), all connected by an abstract interface?

@mkborregaard
Copy link
Contributor

A single concrete geometry type is sadly unworkable.

Could you elaborate on this point? What is the inheritance tree you're suggesting?

@SimonDanisch
Copy link
Member Author

well the points from @yeesian ;) Even though it'd would be wonderfull to have one type that can rule them all, we can't really plan with it since there will definitely reasons for different layouts.
Buy shallow, I mean just containing as few subtypes as possible ;)

@mkborregaard
Copy link
Contributor

OK but then I don't think we disagree. The idea that it sounds like we're all converging on is to have 1) an ecosystem based around the C libraries that try to be consistent without copying among all the C libraries and 2) simple Julia-based implementations of a subset of the functionality, like Shapefiles.jl, that try to use the same concrete geometry that people like me who might go on a "crusade" to port clipper could work with, and both of these satisfying the same abstract interface, using traits where that is most convenient?
And with good conversion functions but a philosophy to mostly not have to convert.

@SimonDanisch
Copy link
Member Author

That sounds perfectly in sync with what I meant ;)

@mkborregaard
Copy link
Contributor

I don't know GeometryTypes very well but to me @yeesian 's scetch of a type hierarchy sounds reasonable. Philosophically I'm inclined to try to make it as simple as possible.

@yeesian
Copy link

yeesian commented Feb 6, 2019

I mean, the simplest possible thing to do is to just define a

abstract type AbstractGeometry end

and to let other package developers interpret what they will of it, haha.

Multiple subtype hierarchies might exist and nothing is guaranteed to work across subtype hierarchies, but that can be left for subsequent discussions to resolve rather than holding up this PR. (My proposal was just to kickstart that conversation, but it doesn't have to happen here.)

@mkborregaard
Copy link
Contributor

I think we should have at least

AbstractPoint <: AbstractGeometry
AbstractLineSegment <: AbstractGeometry
AbstractPolygon <: AbstractGeometry

might perhaps also be good to have

AbstractComplexPolygon <: AbstractPolygon
AbstractMultiPolygon <: AbstractFlexibleGeometry{<:AbstractPolygon}
AbstractMultiLine <: AbstractFlexibleGeometry{<:AbstractLineSegment}

Not wedded to the design just trying to circumscribe some types.
I agree with @yeesian that it's only necessary to have parametric abstract types when it is needed in order to write specialized functions using the interface.

@andyferris
Copy link

(In case it is of interest, there's now some polygon code (and more) at https://github.com/FugroRoames/RoamesGeometry.jl)

@andyferris
Copy link

Nice discussion.

I do tend to agree with the above that a package defining an abstract type hierarchy for geometry, with a few simple Julia concrete types, is the way to go. We just need to flesh out the abstract interface functions we use to interact with such geometries to the correct level of granularity.

A couple of points on the abstract interface functions - if we use functions from Base, they have to do what they mean in Base:

  • eltype is the one that tells you the type for in and iterate and so-on. To me, an ND geometry is a set of ND points. I would say the eltype of a Sphere, say, is some kind of point, and I would frequently query if point in sphere ... end.
  • ndims is an AbstractArray interface function - beyond that a library writer might extend it to describe how getindex works (for a higher-dimensional dictionary, say). Let's please not overload it, since this will be a mess. If a 3-vector is 3D point, we still have ndims(::AbstractVector) = 1.

Finally, I find that when I want to mutate a geometry, I tend to know its concrete type. If you want to push! points to a linestring, say, then I'd just push!(point, linestring.vertices) (or else push!(point, vertices(linestring)) if we decide to have a vertices(::AbstractLineString) interface function). So I wouldn't spend effort adding an abstract type heirarchy for mutability when a few traits and abstract getters will do the job well.

@mkborregaard
Copy link
Contributor

Good points!
This is a detail but just to check (as it is relevant for the PR): Would you expect if point in sphere == true if the point was inside the sphere, or if it were on the edge? Also, is a requirement of in that you can do for point in sphere... which is undefined here?

@mkborregaard
Copy link
Contributor

Answers to my own questions. 1 you mean within, like in the pr. 2. Is not a problem, as sphere does not need to be iterable.
Nice package, really like the structure

@asinghvi17
Copy link

@SimonDanisch @mkborregaard is this ready to merge / are there any TODOs that we could pick up? @visr et al were considering moving over to GeometryTypes once it's ready.

@visr
Copy link
Member

visr commented Aug 6, 2019

Yeah I wonder. I see that there is a Polygon in GeometryBasics.jl. Should we integrate with that package or do you think it will still be merged into this one?

@SimonDanisch
Copy link
Member Author

I plan to switch everything to GeometryBasics :)
Maybe we can make it feature complete together? It's already in a pretty good shape, but still needs a few algorithms copy & pasted over from GeometryTypes & more tests and docs!

@visr
Copy link
Member

visr commented Aug 6, 2019

Ok good to know. In that case I guess we best try hopping on the basic bandwagon straightaway and see what we run into ;)

@SimonDanisch
Copy link
Member Author

Ok don't hesitate to spam me with questions ;) I'd love to get this all to work nicely together!

@sjkelly
Copy link
Member

sjkelly commented Aug 28, 2019

I would like to implement this paper for fast point-in-poly.

Do we want to move GeometryBasics to this org?

@SimonDanisch
Copy link
Member Author

@sjkelly I moved it! :)

@mkborregaard
Copy link
Contributor

so awesome - action on this PR :-)

@SimonDanisch
Copy link
Member Author

I'm going to merge this, even though its incomplete :D I still plan to move the "real refactor" into GeometryBasics.

@SimonDanisch SimonDanisch merged commit 4f1ba71 into master Mar 2, 2020
@SimonDanisch SimonDanisch deleted the sd-refactor branch March 2, 2020 09:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants