Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RTEMS ntpd support #528

Open
wants to merge 1 commit into
base: 7.0
Choose a base branch
from

Conversation

kiwichris
Copy link
Contributor

@kiwichris kiwichris commented Jul 18, 2024

With RTEMS 6 this changes require libntp and that is available in the rtems-net-services package. The repository is available on https://gitlab.rtems.org and it builds for the legacy and libbsd networking stacks.

RTEMS 5 builds the default POSIX support as RTEMS 5 does not have the kernel NTP support.

Configure ntpd with the following env variables:

  • EPICS_TS_NTP_INET : NTP server
  • EPICS_TS_NTP_CONF_FILE : ntpd configuration file
  • EPICS_TS_NTP_LEAP_SECONDS_FILE : ntpd leap seconds file

A configuration file is used if both a configuration file and server IP address are set.

This PR depends on #375. I do not know how to make a merge train on GitHub.

@AppVeyorBot
Copy link

@anjohnson
Copy link
Member

Core Group comment, not a review/request: This PR contains an NTP client for use by RTEMS-5 targets. We might not need to support RTEMS-5 after RTEMS-6 has been released and integrated into Base. There seems to be an issue with the current osdTime code on RTEMS-posix that could be fixed to allow RTEMS-5 to use the EPICS-provided NTP time provider (less accurate than a proper NTP client), but that might be sufficient for any remaining RTEMS-5 users.

@kiwichris kiwichris force-pushed the rtems-ntpd-7_0-with-5-and-6 branch from e227857 to 08d3b2d Compare February 1, 2025 01:18
@kiwichris
Copy link
Contributor Author

I have reviewed the warnings the static checker has raised and they seem like noise to me.

@AppVeyorBot
Copy link

@mdavidsaver
Copy link
Member

mdavidsaver commented Feb 2, 2025

Firstly, Is this PR relevant now that RTEMS 6.1 is released?

I have reviewed the warnings the static checker has raised and they seem like noise to me.

Is the added RTEMS-posix/osdTime.cpp new code for this PR, or taken from some other source?

If it is new, and would be unique to epics-base, then I think it would be be a worthwhile exercise to follow up on these warnings. I agree that the texts of these Codacy warnings are unhelpful, but I think several are actionable in that what is being attempted could be accomplished in simpler ways.

There is a lot of C string manipulation being added with this PR, with zero test coverage. This makes me nervous.

@AppVeyorBot
Copy link

@kiwichris
Copy link
Contributor Author

Firstly, Is this PR relevant now that RTEMS 6.1 is released?

I have reviewed the warnings the static checker has raised and they seem like noise to me.

Is the added RTEMS-posix/osdTime.cpp new code for this PR, or taken from some other source?

New and I have reviewed each case again. My comments are below.

If it is new, and would be unique to epics-base, then I think it would be be a worthwhile exercise to follow up on these warnings. I agree that the texts of these Codacy warnings are unhelpful, but I think several are actionable in that what is being attempted could be accomplished in simpler ways.

Case 1:

The suggestion is the clear memory implies a needs to be clear after use. The memory is a local variable on a calling stack and goes out of scope when the called function returns. The call is static and the compiler will deal with this call in the module wide optimization so doubt a real call actually exists. I can add a clear at the end of the block that uses the code to see if this help but it implies the checker is weak in some areas.

Case 2:

The buffer pointed to by buf is initialized by calls to strlcpy and strlcat to make sure there is no buffer overrun. The strlen call is not protected because the other calls are considered OK. It is shame the analyzer cannot see that. Maybe using strnlen will avoid the warning.

Case 3:

I will add a length to the call using sizeof(a_string).

Case 4:

Looking at this again I think the write call should use the size returned from the read call s if positive.

Any suggestions on how to check the changes without pushing to this PR?

There is a lot of C string manipulation being added with this PR, with zero test coverage. This makes me nervous.

How would test coverage to this code be added to EPICS?

Chris

@mdavidsaver
Copy link
Member

Looking more closely... do I understand that the struct ntpq_rl_data parsing and printing is entirely local to osdTime.cpp? Is this reformatting really worth the significant extra code complexity? Why not just print the output of ntpq verbatim?

@mdavidsaver
Copy link
Member

For that matter, could the whole ntpd setup be externalized through the RTEMS shell? (exposed via rtems ... from the IOC shell)

I don't like seeing configuration file contents hard-coded into epics-base as it prevents, or at least complicates site/instance configuration. eg. your osdNTP_Monitor thread looking for changes to environment variables, and overwriting /etc/... files.

@mdavidsaver
Copy link
Member

How would test coverage to this code be added to EPICS?

epics-base has a framework for unit-testing, and many example in the tree. eg. modules/libcom/test/.

There is so far no modules/libcom/RTEMS/test/, but that would be one obvious place to put test of librtemsCom.a.

For code like this string parsing, which is not really RTEMS specific, I might structure this to build and run also as a host executable. (with the happy side of being much easier to develop and debug...)

@kiwichris kiwichris force-pushed the rtems-ntpd-7_0-with-5-and-6 branch from 08d3b2d to 532b0f6 Compare February 4, 2025 03:02
@kiwichris
Copy link
Contributor Author

For that matter, could the whole ntpd setup be externalized through the RTEMS shell? (exposed via rtems ... from the IOC shell)

I don't like seeing configuration file contents hard-coded into epics-base as it prevents, or at least complicates site/instance configuration. eg. your osdNTP_Monitor thread looking for changes to environment variables, and overwriting /etc/... files.

Maybe the comment at the start of the file may help:

 * NTPD is configured by a file /etc/ntp.conf. The RTEMS /etc/ directory
 * does not survive a reset so ntp.conf is created each boot.
 *
 * Users can configure ntpd by:
 *
 *  None:
 *     ntpd is not started
 *
 *  BOOTP/DHCP:
 *     The BOOTP server IP address is used if there is a valid BOOTP
 *     record. This is currently not supported because it is not
 *     implemented with the libbsd stack.
 *
 *  NTP Server IP:
 *     The environment variable EPICS_TS_NTP_INET is used as the
 *     address if no configuration file is set.
 *
 *  Configuration File:
 *     The environment variable EPICS_TS_NTP_CONF_FILE is a path to a
 *     configuration file that is used as is. This allows site specifc
 *     and even board specific configuration support at run time. It
 *     is recommended you provide a site specific configuration file
 *     if you need site specific control.
 *
 *  The Configuration File option provides a system level means to
 *  configure a site specific ntpd configuration that can be loaded
 *  from a NFS file system when the system starts. This option
 *  provides the ability to implement a site specific configuration of
 *  ntpd.

The config file fragment is present is to provide backward compatibility for existing users who want a simple NTP set up similar to the simple client Eric and other provided years ago. I suggest you provide external configuration files fit for you system including the leapyear file.

If you want EPCIS to break backward compatibility just say but I did not think EPICS did that sort of thing so I went to the trouble of finding a solution.

@kiwichris
Copy link
Contributor Author

Looking more closely... do I understand that the struct ntpq_rl_data parsing and printing is entirely local to osdTime.cpp?

Yes.

[ Also I have attempted to deal with the checker but it seems there is a limit to how far it can see. The buffer is being cleared and the read length is calculated the line before based on the buffer size and the write uses the read length (which is checked against the length passed to read). ]

Is this reformatting really worth the significant extra code complexity?

The NTP client is replacing what existed. If you do not want this monitoring and reporting of the NTP client state it can be removed or disabled by default. Users can be pointed to the the ntpd documentation for solutions.

Why not just print the output of ntpq verbatim?

There is an RTEMS shell command available in the net-services package however it is not monitoring the state. It should not be hard for someone who knows that area of EPICS to add it.

You could also provide a suitable config file to provide external access and do the same thing with any ntpq command based on ntpd.

@mdavidsaver
Copy link
Member

If you do not want this monitoring and reporting of the NTP client state it can be removed or disabled by default.

I am not asking for removal. I am asking whether this (imo. expert debugging) information can simply be presented in a "raw" form with less effort?

Your comment makes it look like the "raw" form is space separated key=value pairs. Potentially with more keys than would be parsed into struct ntpq_rl_data. To my eye, this looks clear enough for an expert to parse.

 * The variable list as returned by the `rl` ntpq command. The output is:
 *
 * associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,

@AppVeyorBot
Copy link

@anjohnson
Copy link
Member

The commit 144f975 from the iocshSetError additions that the Core Group merged earlier today has caused a conflict in this branch, sorry @kiwichris, please rebase and force-push.

Note that the Appveyor build failure was a single "Failed in 1 hr" false positive job; I was trying to re-run that job but it aborted with "Pull request #528 is non-mergeable."

@kiwichris
Copy link
Contributor Author

If you do not want this monitoring and reporting of the NTP client state it can be removed or disabled by default.

I am not asking for removal.

I am sorry, I misunderstood. This is a great discussion and well worth having.

I am asking whether this (imo. expert debugging) information can simply be presented in a "raw" form with less effort?

Your comment makes it look like the "raw" form is space separated key=value pairs. Potentially with more keys than would be parsed into struct ntpq_rl_data.

This is correct. The ntpq documentation provides a complete list. On Linux, FreeBSD etc ntpq uses the same protocol to query a server. There is no direct access to the internals of ntpd. I briefly looked into direct access and found a rabbit hole of potential problems and stepped back from that path.

To my eye, this looks clear enough for an expert to parse.

 * The variable list as returned by the `rl` ntpq command. The output is:
 *
 * associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,

Yes this would remove the need for that code however you also lose the ability to know the synchronized state and if the clock source is usable:

        if (strcmp("sync_ntp", rl.sync_status) == 0) {
            osdNTP_Set_Synchronized(1);
        } else {
            osdNTP_Set_Synchronized(0);
        }

If you have more than one clock source there could be issues if this one cannot cleanly signal its state. This code currently does not propagate this state. I was waiting for this change and this discussion to explore what this means. I feel on RTEMS and in an embedded system this may be important. Currently the code makes the same assumption about the clock source Linux does.

EPICS on Linux or Windows would use the OS system for time unless there is a specialized time source. RTEMS being an embedded RTOS does not come with external networked "clock management" as a standard feature. You need to embed it and that means you need to manage it. I suppose this thread of discussion ends up at the EPICS system level and what the core dev see as being needed? Does NTP clock management on RTEMS:

  1. Rely on external management interfaces built into NTP (remote ntpq)? I do not favor this approach because it could be a DOS vector complicating the ability to lock down a system at the network level.
  2. Rely on an EPICS shell command mapped to the ntpq command? Does not help multiple clock management reporting and status.
  3. Add NTP monitoring to iocStats using this parsed set of data?

Items 1 and 2 means the parsing code can be removed and item 3 means it remains. Note, the monitoring being discussed uses the ntpq networked protocol however it can be restrict to the 127.0.0.1 interface in the configuration file.

@kiwichris
Copy link
Contributor Author

The commit 144f975 from the iocshSetError additions that the Core Group merged earlier today has caused a conflict in this branch, sorry @kiwichris, please rebase and force-push.

All good and thanks for the heads up.

- This changes require libntp and that is available in the
  rtems-net-services package. The repository is avalable on
  gitlab.rtems.org and it builds for the legacy and libbsd
  networking stacks.

- Configure ntpd with the following env variables:

    EPICS_TS_NTP_INET              : NTP server
    EPICS_TS_NTP_CONF_FILE         : ntpd configuration file
    EPICS_TS_NTP_LEAP_SECONDS_FILE : ntpd leap seconds file

  A configuration file is used if both a configuration file and
  server IP address are set.
@kiwichris kiwichris force-pushed the rtems-ntpd-7_0-with-5-and-6 branch from 532b0f6 to f5319dc Compare March 1, 2025 23:58
@kiwichris
Copy link
Contributor Author

I have rebased the PR against 7.0 of of today. The Codacy issues raised have me confused so I am not sure how to address them or how you report things like this to Codacy.

Is this PR able to be merged?

@anjohnson
Copy link
Member

Thanks, we are getting confused by Codacy issues at the moment too. There's an error in the Codacy image logs for this job which shows me this, @ralphlange any ideas?

image

I cancelled the Appveyor build for this PR, the queue there is slow and it doesn't build the RTEMS code anyway.

I don't think we'll let those weird Codacy issues stop us from merging this PR once we're happy with it. I haven't reviewed the latest changes here myself (just answering this for now) but I am putting RTEMS on our meeting Agenda for Wednesday.

@ralphlange
Copy link
Contributor

When logged into Coday, I can see more reasoning:
image

@mdavidsaver
Copy link
Member

Items 1 and 2 means the parsing code can be removed and item 3 means it remains. Note, the monitoring being discussed uses the ntpq networked protocol however it can be restrict to the 127.0.0.1 interface in the configuration file.

Then I am in favor of approaches 1 or 2 as I would prefer not to see this complex print-and-reparse logic in Base.

Personally, I have no problem with allowing a read-only subset of NTP mode 6. imo. EPICS IOCs are by their nature always network attached, and thus always subject to possible DoS.

... you also lose the ability to know the synchronized state and if the clock source is usable: ...

How about a simpler way to parse out this one key/value pair? Of course a regexp. Perhaps also sscanf()?

Copy link
Member

@mdavidsaver mdavidsaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this PR able to be merged?

imo. no.

I'm sorry but I do not like the design being proposed. I would like to avoid our growing a custom, difficult to test, mostly hard coded, init system for RTEMS. This PR seems like a significant step in that direction.

#595 has my thinking about what an alternative design(s) could look like.

wrt. this PR specifically, I would want to see osdNTP_Monitor() removed. I would also want to see some explanation of why osdNTP_Runner() is needed. What all is the osdNTPPvt.config_source state machine meant to handle?

In general, I would like to see a PR which starts with the absolute minimum needed to setup and call rtems_ntpd_run() (assuming it can't already be called from the RTEMS shell). Then add only the simplest diagnostics necessary for a human to verify that the NTP client is running and synced. No new $EPICS_* environment variables. No hardcoded /etc file contents.

@kiwichris
Copy link
Contributor Author

Is this PR able to be merged?

imo. no.

Thanks for the review and feedback.

I'm sorry but I do not like the design being proposed.

Thanks but there is no need to be sorry. It is good we work towards a suitable design and solution. My response here is to try and present the reasons for the choices made.

I would like to avoid our growing a custom, difficult to test, mostly hard coded, init system for RTEMS. This PR seems like a significant step in that direction.

From your ending comment I assume the hard coding reference is the configuration and leap second file fragments? The config files are in the NTP client configuration format. The file fragment is taken from the ntp01 test code in RTEMS. It is the way this stuff works on RTEMS. The code is only in the RTEMS POSIX directory.

Does EPICS testing for RTEMS support networking protocols like NTP? If so I would love the details so I can take a look. Did the NTP client for the legacy stack Eric wrote have a set of tests?

In the simplest configuration mode set the environment variable EPICS_TS_NTP_INET in a boot script with the IP address of the NTP server and the NTP client will use that server when started. I believe that meets a key requirement of your #595 approach.

The single env var EPICS_TS_NTP_INET does not handle some important things a fully functional NTP client may need. Examples are multiple NTP servers or updating the leap year seconds file when a new version is released (without rebuilding the software)? The EPICS_TS_NTP_CONF_FILE is an environment variable you set with a path to a site specific configuration file. The approach is simple and meets the #595 goals plus it means EPICS and RTEMS have no extra code between users and the NTP client code. The NTP standard configuration file format with it documentation can be used as a reference. These configuration files are no different to the files the systems integrators would use if running on Unix.

#595 has my thinking about what an alternative design(s) could look like.

I have responded in that issue and I agree with the approach. I am happy to consider changes however I would prefer I understand and we agree on what is acceptable first. The current approach attempts to:

  1. Follow what I saw in EPICS and Andrew explained which is backwards compatibility to older version. This is achieved with the EPICS_TS_NTP_INET environment variable.
  2. Provide a synchronized status for the NTP clock source. It is my understanding from reviewing the EPICS clock source code it is important to indicate if the clock source is synchronized. If I have misunderstood how that works or the fact NTP on EPICS systems is assumed to work then please let me know?

wrt. this PR specifically, I would want to see osdNTP_Monitor() removed.

How do you monitor the NTP client to determine the clock source is synchronized and the clock source is stable and usable?

On an RTEMS system there is no ability to run some other process or use some other tool to monitor the state of NTP. I felt it important for systems engineers to know the state of the NTP client on RTEMS to aid integration as well as know the state. The monitoring and the EPICS command has proved to be valuable in the systems running this code in production. I added all I could because it was not easy to see what could be removed and yet useful. The data was put there by the NTP designers so I respected their decisions and provided that data to the EPICS command. If EPICS has suitable hooks I can use then yes lets move to using them.

NTP running on RTEMS is a black box. I did look into inspecting the internal state of the client before adding the monitoring via a loopback socket however the NTP client code is not thread safe so there would have been race conditions.

I would also want to see some explanation of why osdNTP_Runner() is needed. What all is the osdNTPPvt.config_source state machine meant to handle?

The osdNTP_Runner() and osdNTPPvt.config_source manage the way a user can configure the NTP client.

RTEMS now supports a complete NTP client and like an instance running on Unix it is configured with a couple of files, the /etc/ntp.conf and /etc/leapseconds file. Without valid configurations the client will not run. RTEMS has moved away from providing back doors and unsupportable global variables and calls to using the standard means a package supports. The approach will bring long term stability to EPICS and RTEMS integration because it is an agreed public facing interface for the related package.

The monitoring and triggering of configuration states lets a user add an environment variable independent of the state of the NTP client and the client will start. There is no services or systemd to manage that on RTEMS. I was not sure how EPICS on RTEMS universally initialized across all the supported BSPs and possible configuration methods. When is a network interface available to use or a DHCP lease received? The running and the configuration state lets the NTP configuration and start happen asynchronously to the initialization and environment variable set up.

In respect to #595 I did mention the timing of the network interface initialization and you could view the solution here as a self-contained instance of the same problem that code would face. If #595 handled initialization events and some sort of dispatch for RTEMS specific services the running code here could be removed.

In general, I would like to see a PR which starts with the absolute minimum needed to setup and call rtems_ntpd_run() (assuming it can't already be called from the RTEMS shell).

I am not sure what that achieves as this PR provides that and then everything else that is needed? This code has been running in production systems no-stop from July 2024. As stated I added all the NTP fields to the command to avoid any extra overheads and complexity for EPICS users of this NTP client. They do not need a remote ntpq command or a specific ntp.conf file to provide access.

Then add only the simplest diagnostics necessary for a human to verify that the NTP client is running and synced.

Could you please indicate using the ntpq data what that is?

No new $EPICS_* environment variables. No hardcoded /etc file contents.

How would the NTP client work without the configuration files it needs?

I think a "hardcoded" restriction that bans publicly documented file formats will make RTEMS on EPICS difficult. Maybe the EPICS core devs should discuss what this means and then maybe reach out to the RTEMS community? This PR and #595 is aimed at cleaning up years of EPICS peeking under the hood at RTEMS and then breaking when RTEMS changes. Part of the solution where appropriate is to move to file formats the underlying software provides. Nothing here or in #595 has been created by RTEMS and in the case of ntp.conf that format is over 30 years old. I hope you reconsider this position and appreciate these are not formats created by RTEMS for EPICS. We consider integration and use carefully in RTEMS and the impact of users.

There are parts of this code that could be moved to RTEMS net-services and test added if this helps EPICS and EPICS integration such as the default configuration for a single IP address and the monitoring? The monitoring thread would be best handled in EPICS given it can manage priorities. I think the ability to allow a user to provide a configuration file is a good addition for EPICS and should be kept.

@hjunkes
Copy link
Contributor

hjunkes commented Mar 11, 2025

Good morning Chris,
Today on my Beagleboneblack with RTEMS 6.1 and EPICS 7 with the NTP-PR I have this in the console:

beaglebone>
beaglebone> NTPTimeSync: Sync recovered at <undefined>
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (1 occurrences)
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (18 occurrences)
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (13 occurrences)
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (13 occurrences)
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (12 occurrences)
NTPTimeSync: NTP requests failed since <undefined> - Success
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (13 occurrences)
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (13 occurrences)
NTPTimeSync: Sync recovered at 2025-03-11 10:31:25.576713
cpsw0: ipv4_addroute: mcpsw0: ipv4_addroute: m_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (12 occurrences)
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (2 occurrences)
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (2 occurrences)

beaglebone> _bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (4 occurrences)
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (2 occurrences)
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (2 occurrences)
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (2 occurrences)
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (2 occurrences)
_bsd_sonewconn: pcb 0x8e694148: Listen queue overflow: 7 already in queue awaiting acceptance (3 occurrences)
pthread_create error No more processes
epicsThread: Unexpected C++ exception "unable to create thread" with type "N11epicsThread20unableToCreateThreadE" in thread "TCP-acceptor" at Tue Mar 11 2025 10:48:12.375603187

*** EXIT STATUS NOT ZERO ***
exit source: 5 (RTEMS_FATAL_SOURCE_EXIT)
exit code: 1 (0x00000001)
RTEMS version: 6.0.0.b4149b2282ee3543455894ae7981703de1d719ac-modified
RTEMS tools: 13.3.0 20240521 (RTEMS 6, RSB b1aec32059aa0e86385ff75ec01daf93713fa382-modified, Newlib 1b3dcfd)
executing thread ID: 0x0b010021
executing thread name: TCP-acceptor
U-Boot SPL 2018.09-00002-g0b54a51eee (Sep 10 2018 - 19:41:39 -0500)

I have not yet investigated where this suddenly comes from. I haven't seen it before.
Heinz

@kiwichris
Copy link
Contributor Author

Good morning Chris, Today on my Beagleboneblack with RTEMS 6.1 and EPICS 7 with the NTP-PR I have this in the console:

[snip]
<
I have not yet investigated where this suddenly comes from. I haven't seen it before. Heinz

Thanks for the report. It looks resources or network related. NTP has kindly reporting its state to the console. I do not think it is related to this PR.

I suggest moving this to the RTEMS Discourse site at https://users.rtems.org under Applications / EPICS category?

@mdavidsaver
Copy link
Member

@kiwichris You write quite an essay ;) I will try to pick a couple of essential points to address. Please feel free to call me out if I skip over something which you fell is essential.

I'll start with the notion of configuration files.

From your ending comment I assume the hard coding reference is the configuration and leap second file fragments?
... It is the way this stuff works on RTEMS. ...

I am referring these these configuration files. While this may be the way things are normally done with RTEMS in general, and I'm sure there are good reasons for this. eg. your general case is likely very general. However, for the much less general case of EPICS IOCs, I think there is a better way.

... These configuration files are no different to the files the systems integrators would use if running on Unix.

My objection is not to the content, but rather to the packaging. imo. these stop being practical configuration files when the content is embedded in C code!

An analogy which might help you to understand what I see. This PR seems to me like a Linux distributor providing a single pre-built initramfs image, expecting each installation to unpack that image, manually edit some files, then pack it back up again. What I have in mind is analogous to the automation which avoids requiring end users to even know about initramfs. Now I don't know if we will be able to get all the way to that point, but I think it is necessary to try.

I did not attempt to expand on these ideas in detail above because it will require more knowledge than I think is reasonable to ask of you ( @kiwichris ) about the EPICS build system, how the wider EPICS ecosystem is organized, and what our user's expectations are.

Touching on a couple of points:

  1. Config files should appear in the source tree as individual files.
  2. Where possible, epics-base should provide reasonable defaults. BSP specific if necessary.
  3. It should be possible for a support module or IOC application to replace some of these default files without recompiling epics-base

To give you an idea of where I think this design starts, I have encouraged @anjohnson (our Make maker) to think about how target filesystem fragments (including config files) should be placed into the installed tree of each EPICS module.

@mdavidsaver
Copy link
Member

The monitoring and triggering of configuration states lets a user add an environment variable independent of the state of the NTP client and the client will start.

In the EPICS world, this change action should be handled with an IOC shell command.

Remember that every IOC instance will have an IOC shell script. This is how an instance is "personalized". So it is normal for an engineer setting up, or changing, an IOC to do so by issuing IOC shell commands.

So while RTEMS shell is mostly used (I guess) for diagnostics, the IOC shell is first and foremost the central method for configuring an IOC instance. The many diagnostic tools are icing on that cake. Understand this, and all that it implies, is one of the big mental hurdles everyone has to clear in understanding the way EPICS IOCs work in practice. (so you are in good company if this seems foreign to you now)

There is no services or systemd to manage that on RTEMS.

Clearly. Otherwise you would not be trying to add a lite version to epics-base ;)

I was not sure how EPICS on RTEMS universally initialized across all the supported BSPs and possible configuration methods.

There isn't a way. The situation so far is imo. a mess of organic growth over many years. I am looking to exploit the disruption which is already being caused by the RTEMS 4 -> 6 transition to sweep some of this mess away, or at least isolate it along with the "legacy" network stack code.

When is a network interface available ...

Right, exactly my concern. There will never be one answer to a question like this. So my "answer" is to make it scriptable, and allow this scripting to be easily overridden/extended at various levels.

@kiwichris
Copy link
Contributor Author

@kiwichris You write quite an essay ;)

Yes it was a lot but I felt it easier to lay out things in the hope it saves time.

I will try to pick a couple of essential points to address. Please feel free to call me out if I skip over something which you fell is essential.

I welcome your feedback because it makes me rethink how RTEMS integrates with users. The work in the PR is what is needed and that was goal. Your feedback highlights real issues and that is a good thing and it shows the work present needs to be broken up. I am happy to take this PR as a first pass base to work from.

I'll start with the notion of configuration files.

From your ending comment I assume the hard coding reference is the configuration and leap second file fragments?
... It is the way this stuff works on RTEMS. ...

I am referring these these configuration files. While this may be the way things are normally done with RTEMS in general, and I'm sure there are good reasons for this. eg. your general case is likely very general. However, for the much less general case of EPICS IOCs, I think there is a better way.

Makes sense, thanks.

... These configuration files are no different to the files the systems integrators would use if running on Unix.

My objection is not to the content, but rather to the packaging. imo. these stop being practical configuration files when the content is embedded in C code!

We could package them into EPCIS as you say but we need to preprocess the file to add the user's IP address from the boot environment when coping it to /etc. Does EPICS have support to take a file and expand env vars?

Another possibility is moving that code into RTEMS's net-services where the NTP code lives and handled there via an agreed interface work? This offers the benefit that each RTEMS release's config file format has to work and EPICS does not care.

An analogy which might help you to understand what I see. This PR seems to me like a Linux distributor providing a single pre-built initramfs image, expecting each installation to unpack that image, manually edit some files, then pack it back up again. What I have in mind is analogous to the automation which avoids requiring end users to even know about initramfs. Now I don't know if we will be able to get all the way to that point, but I think it is necessary to try.

The analogy works. If the file is in the EPICS configuration data do you see a user updating a make variable to point to a user specific configuration file to override the default behavior?

I did not attempt to expand on these ideas in detail above because it will require more knowledge than I think is reasonable to ask of you ( @kiwichris ) about the EPICS build system, how the wider EPICS ecosystem is organized, and what our user's expectations are.

Fair. I am learning and welcome any guidance or insights so please call out any issues you see.

Touching on a couple of points:

1. Config files should appear in the source tree as individual files.

Agreed.

2. Where possible, epics-base should provide reasonable defaults.  BSP specific if necessary.

OK

3. It should be possible for a support module or IOC application to replace some of these default files **without recompiling epics-base**

Agreed and nice.

To give you an idea of where I think this design starts, I have encouraged @anjohnson (our Make maker) to think about how target filesystem fragments (including config files) should be placed into the installed tree of each EPICS module.

Sure and thanks.

@kiwichris
Copy link
Contributor Author

The monitoring and triggering of configuration states lets a user add an environment variable independent of the state of the NTP client and the client will start.

In the EPICS world, this change action should be handled with an IOC shell command.

Thanks and that makes sense. There is a complication here. The RTEMS net-services wrapper around the NTP code provides the ability to stop and restart the service. How well it works depends on the options provided to the client. The NTP code is old and its clean up code when closing down is questionable. It appears to rely on a process exiting to clean up. We have looked into dealing with this and in some cases it works but you enter a deep hole with no bottom when attempting changes.

As a result should we protect EPICS users and not allow runtime configuration changes? A change requires a reboot? Doing so removes the code.

Remember that every IOC instance will have an IOC shell script. This is how an instance is "personalized". So it is normal for an engineer setting up, or changing, an IOC to do so by issuing IOC shell commands.

You see my embedded background conflicting here. This is a nice feature.

So while RTEMS shell is mostly used (I guess) for diagnostics

The RTEMS shell is not a good example of a shell and yes it can run commands which is basically a hard coded text name to function caller. Having access via the EPICS rt command is an important diagnostic tool.

the IOC shell is first and foremost the central method for configuring an IOC instance. The many diagnostic tools are icing on that cake. Understand this, and all that it implies, is one of the big mental hurdles everyone has to clear in understanding the way EPICS IOCs work in practice. (so you are in good company if this seems foreign to you now)

I am seeing it and starting to understand but I do not think I am there. I only use EPICS enough to deal with RTEMS related issues and it shows.

There is no services or systemd to manage that on RTEMS.

Clearly. Otherwise you would not be trying to add a lite version to epics-base ;)

Haha yeah lets not.

I was not sure how EPICS on RTEMS universally initialized across all the supported BSPs and possible configuration methods.

There isn't a way. The situation so far is imo. a mess of organic growth over many years.

Agreed

I am looking to exploit the disruption which is already being caused by the RTEMS 4 -> 6 transition to sweep some of this mess away, or at least isolate it along with the "legacy" network stack code.

Excellent. I am onboard with this.

When is a network interface available ...

Right, exactly my concern. There will never be one answer to a question like this. So my "answer" is to make it scriptable, and allow this scripting to be easily overridden/extended at various levels.

We saw the issue from the RTEMS perspective and added the net-services package as a place to smooth over the difference in the legacy, libbsd and even lwip stacks. I am happy to provide interfaces that can be used in a consistent way in EPICS. For example a call that probes an interface to see if it has an IP address. This is off topic for this PR and removal of the configuration probing code will be actioned.

One item remaining. I can move the monitoring code to RTEMS net-services and then EPCIS can selectively decide on monitoring NTP for synchronized time. This would let EPICS control the synchronized state of the clock source. Does this work for you?

@mdavidsaver
Copy link
Member

... Does EPICS have support to take a file and expand env vars?

Oh yes. Several ways at build and runtime.

Most relevantly, the IOC shell expands environment variables as each line is executed. This is why the notion of eg. pre-populating the environment from eeprom is a natural one. Quite a lot can then be done with these values.

Here is one intimidatingly complex example of a reusable IOC shell fragment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants