-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor into C++ #56
Conversation
The main entry-point file now uses Cython. It's basically a stright-translaton from the original C. The code is not idiomatic and is rather raw/ugly at this time. As more functionality is moved to Cython the main file will become cleaner.
This is to avoid name conflicts within Cython. On import in python-land the name is remapped to int and float, respectively.
Currently, only code to query an object type or see if an object is a number is implemented. The functions have been removed from the C code. Cython does not natively allow multiple .pyx files to comprise an .so file, so to workaround this we include the non-main .pyx files into main.pyx, rather than import them. It's a bit hacky, but it gets the job done.
We will migrate to using C++ for non-cython code going forward.
The goal now is that most logic will be done by C++ classes, and cython will be used to only handle conversions to python objects and error handling (which is challenging to do by hand).
This class is designed to accept python objects that are numeric, and determine the numeric type and how to best convert them if conversion is needed. This class is implemented as header-only since it is pretty small. A Payload class is also included in this commit, but not yet used. The NumericAnalyzer has been hooked into the query_types function - it will be used in other function in future commits.
There is now an Evaluator class, and a Parser hierarchy. The Evaluator constructs the correct Parser subclass, and then that is used to do the acual analysis. The Evaluator encodes some logic.
There are still LOTS of bugs and missing functionality, but this is a good skeleton and commit point.
Invalid unicode data should raise ValueError, not TypeError.
If base was not given, it was set to NULL, which would cause a segfault when converting to Py_ssize_t. Explict NULL checking is now done.
A handful of older C-files are no longer used in the current refactor. These have been removed to make it obvious what is actually going on.
If an overflow might occur, the parsing is dispatched to Python for more accurate handling.
There were two issues - If the base was explictly given it was not correctly disabling unicode parsing - It was converting numeric values to integers erroneously
If a class defines __int__ or __float__ or __index__, this is now supported. To clarify internal workings, "not_numeric" is now "not_float_or_int".
If a length was given to vector::reserve is too large, it caused a segfault. std::size_t was being used for the return of PyUnicode_GET_LEGNTH instead of Py_ssize_t, and these have a sign difference. The working hypothesis is there was an overflow happening that caused a very large number to be passed to vector::reserve. Using the correct types will hopefully avoid segfaults going forward.
Now taking advantage of templates and other C++ nicities. Awesome.
The PyNumberType enum was moved to evaluator.hpp and renamed to UserType. Some other enum values have been renamed to make all enum values unique in the Cython file.
This will make it easier to manager removing underscores if necessary.
If the input was of zero-length or a single whitespace, the previous algorithm would inadvertently store the end pointer before the start pointer. While we are at it, change the parser to store length internally instead of the end pointer, which is a bit cleaner.
Underscore parsing will allocate new memory, but it is assumed this is a low-probability event so this is deemed an OK happening.
For C++, we are now using clang-format instead of astyle
Instead of being a single class with switch statements used to choose correct behavior, a class hierarchy is used with overridden functions. This makes the functions a bit easier to understand, and will facilitate more flexible future enhancements.
In the previous C implementation, if a string was too long to convert naively the character array was passed to a python conversion funciotn. Prior to this commit in the C++ implemenation, if the character array was too long then the original object was given to the python conversion function. This was a waste of time because the character array then had to be re-extracted under the hood. Now, the extracted character array is used as it was in the C implementation. Furthermore, the code has actually been simplified because now we can return Python objects in the Payload, rather than sending an action signal indicating how to treat the python object. This removes a lot of unnecessary and ugly code.
Separate out the exception raising into a function so that excpetion messages are only built if the exception is going to be raised.
Numbers are pre-checked for correctness before sending to the python conversion functions because that is faster than creating an exception object when an error does occur. In the process, some code was refactored to remove copy/paste.
It turns out that Cython adds some overhead that makes fastnumbers just "numbers", lol.
The changes boil down to the following - Addition of the UserOptions object to store user options - Creation of the TextExtractor object to perform the actual extraction of text (pulled out of the Evaluator) - The Parser objects are now injected into the Evaluator, and are now created from the TextExtractor
The final details of what to return to the user are now owned by the Payload object, as part of the resolve() function.
Instead of storing a pointer so that we can use polymorphism, we are using templates and a reference to the Parser in the Evaluator class. Ideally, this will be slightly more performant because the compiler will no longer need to make indirect calls. As part of this, some of the methods of the Parser base class have been made pure virutal since there is now no chance of instantiating a Parser.
Remove the union, convert to PyObject* on construction, not on request.
d6b1cfd
to
d49d96c
Compare
Codecov ReportBase: 95.61% // Head: 97.60% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #56 +/- ##
==========================================
+ Coverage 95.61% 97.60% +1.99%
==========================================
Files 6 4 -2
Lines 1048 668 -380
==========================================
- Hits 1002 652 -350
+ Misses 46 16 -30
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
e148397
to
cdc70db
Compare
This is EOL, and is no longer supported by the testing infrastructure.
cdc70db
to
30c80a3
Compare
This is deprecated.
This reverts commit ebaae51.
12c6c6e
to
1a016b2
Compare
365ef4e
to
d2b4800
Compare
This will close #45.
Feature neutral, but the code is now a bit more object oriented which makes it more manageable and extendable.