-
-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSError Invalid argument when reading shape file #235
Comments
Interesting. First, can you report the pyshp version, and give the full python stacktrace/error message (not just the last line)? Encoding errors are not typically OSError, so i think it's more likely what you say about byte unpacking/args when reading the .shp file. In the for-loop, do you do any additional processing of the shapefile, or did you show the whole loop? Best would be if there's any public links or github repos where i can access the file? Or depending on the filesize, if you can upload as an attachment or send as a file sharing link? |
No, this is just smoke-test script that opens each and every shape file we produce to see if it can be read and that number of features is not 0. Version installed: Stack trace (I ran from interpreter session which is why there is no script line):
File is created using Java: Just verified - still reproducible in latest version of PyShp |
Thanks for providing the file and the other details. After some testing, here is a rather lengthy response, which can be summarized as the file containing corrupted data towards the end of the file. It appears the error happens when trying to read beyond shape #135307, ie the error is with #135308:
It looks like the shapefile has a length of 135308 but the reader is trying to read beyond the end of the file:
The error happens only with
Reading the actual contents of the last shape confirms that the end of the last shape should be at byte position 18462948:
Similarly, the shapefile header also agrees that the length of the file should be 18462948:
Trying to read the beginning of the next shape results in unexpected data:
So this is an issue of the writer erroneously writing too much data, or the header containing incorrect header information, depending on how you view it. Since both the shx and dbf headers says the file length should be 135308, and since the beginning of the next shape contains unexpected data, I am inclined to believe that the extra data got corrupted in some way and that the headers represent the correct number of non-corrupted shapes/records. If I were to speculate I wonder if the java writer tried to write additional data, and when it failed for some reason it wrote the number of successfully written entries to the header, but did not remove the partially written extra data from the files. Looking at how
This does highlight however that pyshp's approach to reading corrupted files like this should be updated (see also #147, #223). It looks like a previous decision was made to believe the actual length of the shapefile rather than the number of shapes specified in the shapefile header, according to a comment in the code: In short, the question boils down to whether we believe the number of shapes as stated in the header, or based on the length of the file. Since the dbf reader goes by the header information, and since |
One possible problem is that in the absence of the shx file, we don't know the number of shapes in the .shp file. A possible solution:
It would also be possible to skip through all shapes to count the number of shapes and using this to set the |
Hi, I've encountered similar errors with the v2.3.0 version and in my case I could bring it back to an issue with unpacking the byte array containing the recNum and recLength in On the post of 17 March 2022, a similar problem is observed, where I haven't done a thorough analysis on this issue, so I cannot propose a decent PR, but it could give insight to consider in the next version. |
Thanks for looking into this, and for the work-around @jjuch. The difference between The original shapefile spec is quite clear that integers should be signed, by the way:
Nonetheless, there are precedents in PyShp and beyond for being more tolerant than that of Shapefiles, that strictly speaking are non-compliant. Adding support to handle such files could well boil down to a single character. Do you have an example shapefile, and do you know from where it originally came? |
Ok, my bad, it was another issue in the shape's content header, where the recLength was incorrect and it returned one of the vertices' coordinate as the next recLength, causing negative values. Thank you for the fast reply, though! |
On some shape files containing data from OSM (i.e. could be encoding or special characters in road name column) I am getting OSError Invalid argument.
When I debugged what I see is that after it read last feature it is still not at the end of the file, thus in iterShapes() it thinks that there are more shapes and tries to read it:
Then in __shape() it parses recLength and it is garbage (some negative number), tries to compute next records' location and puts it somewhere beyond file's valid locations.
Since file has 600K features I don't really know how to help further.
What is strange though is that QGIS and GeoPandas are able to parse same file.
The text was updated successfully, but these errors were encountered: