Skip to content

Remove colons from NSQ filenames #125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
davidstoker opened this issue Feb 14, 2018 · 1 comment
Closed

Remove colons from NSQ filenames #125

davidstoker opened this issue Feb 14, 2018 · 1 comment
Labels
Milestone

Comments

@davidstoker
Copy link
Contributor

I recently deployed the S3 loader in a new setup with part of it working like this:

Scala stream collector -> NSQ topic -> S3 Loader -> S3

This works as expected with files being sinked to the S3 bucket with names like this:
2018-02-14-03:52:55.281-03:54:21.489-1579626703.lzo

The ETL process was then kicked off using EmrEtlRunner but would immediately fail on the first step with errors like this:

Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: 2018-02-14-03:52:55.281-03:54:21.489-1579626703.lzo

It was a cryptic error at first but then was obvious since HDFS treats colons as special characters. The time format defined here (https://github.com/snowplow/snowplow-s3-loader/blob/master/src/main/scala/com.snowplowanalytics.s3/loader/NsqSourceExecutor.scala#L80) is causing the problem. The Kinesis sink uses different file name handling so the problem doesn't exist there.

Would be glad to open a PR since it's a simple fix but wanted to capture as an issue for any others with the problem. I'm currently operating with a fork that has the time format changed to HHmmssSSS instead of HH:mm:ss.SSS

@alexanderdean
Copy link
Member

Ouch, good catch @davidstoker ! A PR would be most welcome. We shouldn't be using :s in filenames in any case (we don't with Kinesis)...

@BenFradet BenFradet added this to the Version 0.7.0 milestone Feb 20, 2018
@BenFradet BenFradet changed the title NSQ Filename Breaks EmrEtlRunner process Remove colons from NSQ filenames Feb 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants