Data Historian Best Practices for AI & Modern Applications

If you’ve been feeling pressure to get your process data into AI tools, cloud platforms, or third-party analytics applications, you’re not alone. Manufacturers across every industry are being asked to do more with their data than ever before, and the data historian sitting at the heart of your operation is either going to be the foundation that makes it possible, or the bottleneck that holds you back.

The problem is that most historians are configured the same way they were a decade ago. The hardware recommendations, the network setup, the access strategy, it was all designed for a world where PARCview or a similar client was the only thing reading the data. That world has changed. AI applications, cloud pipelines, and third-party platforms are now first-class consumers of process data, and your historian needs to be ready for that.

In this session, the dataPARC team walks through what good historian configuration actually looks like today. We start with the Purdue Model as a framework for understanding how data should flow through your network, then work through infrastructure best practices, high availability, API access for external applications, and data governance. Whether you’re evaluating a historian for the first time or looking to get more out of the one you already have, this is the conversation worth having before you build anything.

Data Historian Best Practices Video

Data Historian Best Practices Transcript:

0:02

Hello, everybody. Thank you for joining us today.

0:06

This is going to be a webinar that we call Industrial Data Insights, and the title is going to be Data Historian, Best Practices for AI and Modern Applications.

0:19

I myself is Aaron Moser, and I’m here to introduce Wayne McAdams and Nick Imers, two of our senior project engineers that will be talking today.

0:29

But before we get into that, let me go over a little bit our webinars and a little bit about a data dataPARC.

0:36

So we normally produce multiple different kinds of webinars every year.

0:39

One is the dataPARC type university webinar where we really go over the skills and techniques of using dataPARC tools.

0:48

Another type that we produce are industrial data insights where we try to take a step back and take more of a broader industrial perspective on how software and data are used in the industry.

1:01

Then we also have solution spotlights, and that’s where we go normally partner with a customer and go over how they are utilizing the dataPARC software in a real application at their site.

1:13

All those webinars can either be found on our dataPARC community forum where you can see the university solution spotlights, industrial data insights and quick tip webinars, and also on our YouTube channel where we show more just the solution spotlights, industrial data insights and quick tips.

1:34

And during this webinar if you have any questions please type them in the chat or maybe if you’re having audio issues and you want to have somebody come back in with some some help or suggestions on how to deal with those issues.

1:47

You can also email any of your questions then we can reply after the webinar to dp-info at dataPARC.com and if we don’t get to your question that you post in the webinar during the session, we will get a hold of you via an email afterwards.

2:07

So if you’re not familiar with dataPARC , here’s a little bit of the background about our company.

2:11

In 1980, a group of engineers working at a plant needed to develop some tools around historian and plant information systems.

2:21

Then in 1997, they decided to form a dedicated company or group on producing those tools and providing them to industry.

2:31

Today, that has grown to where we have customers across the globe and across many, many industries.

2:37

dataPARC is trusted by manufacturers globally, whether that’s in the pulp and paper industry, oil and refining, general chemicals, power, municipalities, or any other type of plant that needs data to help run and understand their process.

2:55

dataPARC is headquarters in Rochuga, Washington, but we’re part of the Global Voice Group.

3:01

The company is focused on data historians, real-time visualization, analytics, and monitoring.

3:10

Now that you have a little bit of a background about dataPARC , let me hand this over to Nick Imers and Wayne McAdams to discuss today’s topic.

3:21

Thank you for that introduction, Aaron.

3:23

So real quick, let’s run through the agenda just so we see where we’re headed we’re gonna be talking data historian architecture today and it’s just increasingly relevant and our Increasingly data driven world why it’s important that you be able to have reliable Consistent Fast data at your fingertips all the time and that you can get it where you want it We’re gonna be spending a lot of time discussing the Purdue model specifically and how DatapdataPARC fits into that.

3:55

And for any of our existing customers that are watching, one thing I did want to call out is we’re going to be talking specifically at our newer historian, which is called dataPARC.store.

4:04

It is different than our classic historian, which is PARC History.

4:10

We’ll talk about a little bit more in the slides, but PARCHistory has been a very solid historian for many decades now.

4:17

But times have changed, and so store is a better fit for this.

4:20

And so that’s why we’re focusing on it for this presentation.

4:25

Wayne will be talking about some hardware and infrastructure best practices.

4:28

We’ll get the most out of it.

4:30

We’ll be talking a little bit about uptime, how it relates to high availability, and how that relates both on the data collection side and on the data retrieval side.

4:40

There’s one example of an enterprise customer.

4:44

And then how we kind of differ from some other historians and how we kind of control data, and we’ll have a short Q &A.

4:53

Thank you, Nick.

4:55

Why does all this matter?

4:57

Historian configuration has really become a more strategic decision over the decades that we’ve been in business.

5:05

And it’s not just an IT decision anymore.

5:08

It spans IT, OT, corporate decision-making.

5:15

AI applications are demanding more from historians than ever as that industry starts to mature.

5:24

Cloud platforms and third-party tools need data access.

5:28

The world is becoming more and more interconnected, and we’re using more and more third-party tools to maximize the benefit of the data that we’re collecting, and we need to be able to provide the access to tools as simply as possible.

5:49

Many historians still run based on decades old assumptions and decisions.

5:53

What this means is a lot of them were installed a long time ago and they were installed into a specific sort of data use environment to be concentrating on process optimization at the process floor level and some penetration into the boardroom, into decision-making and decision-making, but over the years historians have evolved to become a much more integral part of the overall decision chain on capital and operations matters as well. And modern applications are hungry for the data.

6:30

We collect a lot more data than we used to as capabilities have improved and we need to able to get that data to third-party applications, to business execution systems, and as quickly as reasonable for them to be able to do what they need to do.

6:51

So we need to be able to use deep-level access.

6:56

Some communication standards are very robust, but not very fast, so we need to be able to users of the data.

7:10

Also, this is your data.

7:12

And whatever work you want to put that data to, your historian needs to be able to facilitate that as much as possible.

7:25

All right, let’s talk a little bit about IT OT infrastructure and the Purdue model.

7:32

Historically, OT systems, operations technology systems, were isolated from corporate information technology systems.

7:39

Demand for real-time operational data and business decisions, IoT initiatives, cloud-based analytics platforms, and remote monitoring requirements have driven the need for communications between IT systems and OT systems.

7:54

We just really can’t live without our operational data for analysis in real-time anymore.

8:01

The Purdue model is a hierarchical framework that was designed in the 1990s for organizing and segmenting industrial IT, OT networks into discrete levels with controlled communications between them.

8:13

This was an attempt to build a structure that folks could use in their network design and their plans so that they could control information flow for cybersecurity isolation, defining what their control data pathways are gonna be, organizational clarity of who owns what, what accountability of IT and OT, personal have, and change management.

8:43

This system, while this is showing its age, it was designed in the 1990s.

8:46

This is still a pretty dominant reference model that many people use for designing the cybersecurity infrastructure in their plan.

8:54

I’m going to discuss some things we see on the slide on the right side.

8:59

Typically, level 0 through Level three are called the control zone.

9:03

That’s where your control systems reside in the plant.

9:07

Of course, there may be multiple networks for different vendors applications, but level zero and level three are the reference level that says this is a control system.

9:19

Level four and level five is your business networks.

9:25

That’s where your accounting systems are, where your users usually are, and where your Internet and Cloud connections are.

9:36

Level 3.5 is considered Demilitarized Zone, is what DMZ stands for.

9:42

That’s the level where you forge connections in between the isolated OT level and the more available IT level.

9:54

Note, we also showed that we have some firewall icons over there on the right, And that’s sort of the way the Purdue reference architecture says to design the system.

10:10

Data historians sit in the boundary layers, and they move data between the IT and OT network, which is a very challenging thing to do security-wise.

10:19

Some things data historians need to have is they need to have strong application isolation.

10:24

There are applications that data historians need to not be involved in data collection.

10:30

They need to have strong authentication and encryption.

10:34

And they need to require minimal opening of firewalls.

10:38

And also, we need to not be talking into the OT network.

10:45

We need to isolate the OT network so that we only respond to connections from, for example, our data collector in the L3 network, initiates communications into the DMZ.

10:57

and until it initiates that communication and establishes that secure data pathway, we do not try and communicate with the OT network.

11:09

Yeah, so Wayne has gone over the basics of the Purdue network.

11:13

And like you mentioned, every network is gonna be a little bit different and especially with these OT systems, Sometimes you have legacy hardware or software that can be very old and just impossible to properly secure.

11:32

And so it may have multiple layers of segmentation or firewalls around just that to try to secure it.

11:40

But as far as the basic Purdue model goes and how dataPARC fits into it, I’ve tried to make a little flow diagram here so we can follow the data from one side to the other.

11:50

And so just to kind of go over the flow that I’ve tried to show here.

11:53

So on the left, we have the process control zone.

11:56

So we have L0 and L, you know, through L2 kind of all bundled together on one.

12:01

And then so we, and then on the far right side, we have the enterprise zone or L5.

12:06

So effectively we’re going from the left where things are sort of most secure to the right where things are, you know, the least secure or at least you would expect them to be the most vulnerable to do some sort of incursion.

12:21

And just like Wayne said, a lot of the basics of the pre-model is trying to ensure that data flows, in this case, from this diagram, from the left to right, and as little as possible, the other direction.

12:36

On the process control zone, you have some control system, whether it’s a DCS or a PLC or a SCADA, it doesn’t really matter.

12:45

Then generally, it’s being fed up to something.

12:49

OPC server traditionally takes the place of a printer driver that makes it easy for other systems to get that information without speaking the proprietary language of the control system itself.

13:00

But all that lives on usually being like L2.

13:05

Then we have L3, which is still within what Wayne called the control zone before.

13:10

Typically, this is where our data collection app lives, which for storage is just called Collect.

13:17

You know, there are various, you know, again, some of this stuff can change and shift.

13:23

You know, I think store is pretty flexible, but usually this is a good place for it because there usually is not a firewall in between levels two and three.

13:31

And that’s good for classic OPC, you know, OPCDA and HDA.

13:37

Don’t do, you know, they were designed actually in the mid nineties.

13:40

So it’s like three decades now.

13:44

and they weren’t made for the modern environment really, and so they don’t handle firewalls well, and so if you’re doing that, you need the collection app to be pretty close to the OPC server.

13:56

UA is more modern, it uses security certificates, so it’s more flexible.

14:00

But I will say L3 is a good place for this, especially for the classic DCOM related stuff because Data Part Collect can sit there, there’s no firewall in the middle, it can buffer data, So if there are disruptions, it doesn’t get lost.

14:14

You know, some people will put collect in the DMZ, but that does mean that you’re going through a firewall, which doesn’t work well for DCOM.

14:20

And it does make you more prone to, you know, if that something happens with the network connection, you know, that’s data that you’ve lost forever.

14:27

So level three is a good place for collect.

14:29

It’s reliable.

14:31

And then you have generally the historian sitting in the industrial DMZ level three, you know, 3.5.

14:38

And like Wayne mentioned, you know, we’re really trying to make this the buffer zone where data does not flow from the right side back across this.

14:46

And so that’s where you generally will have a firewall on either side of the DMZ, trying to buffer the control zone from the enterprise zones on the right.

14:57

In an ideal world, you would really have nothing that flows from right to left, but you’ll see there’s a few connections we’ll talk about.

15:03

But for the most part, data only flows up and then there’s a few retrievals coming from the other side here.

15:08

But the main one here is we have store, and it has its own little configuration database that runs out of Postgres or it can run on.

15:16

It’s a very lightweight database, it’s just configuration.

15:19

There’s no actual data, so it works even in Express.

15:22

But the one thing to keep in mind is that this is where the historian archive files live so it can write two of them and serve them up as needed.

15:30

Although, again, for more complicated networks, we’ve even seen people do store and forward type situations where if you have a more segmented network than the traditional produce structure.

15:40

Store will still work for that, just have to go through a little more work.

15:45

But in the traditional sense, store will live here in 3.5, and then you have a bunch of clients in the UA servers that are generally on our application server.

15:57

In the old world, we would have called the business network, you know, which is now for, and they’re reading from the historian kind of on demand does things like part few clients ask for it.

16:08

But like Wayne mentioned, they’re just going on one port.

16:11

It’s going with, if you care, it’s a gRPC, which is basically HTTP, and it’s using encryption via TLS, so it’s quite secure.

16:20

Just one port in the firewall, fairly minimal.

16:24

Then it serves it up to whatever needs it, which it can include things.

16:28

In this case, I’ve drawn our Nexus services in L4, but they could even live in L5 and that wouldn’t be a deal breaker or anything like that.

16:36

It’s just a matter of whether you want, for instance, like phones to be able to access them, if they’re directly on the middle network without a VPN or if they’re coming from the internet, or if you want even more firewalls in the middle to make it more secure.

16:50

But largely, it makes it really nice to break it down.

16:54

This is something where, like I said, PARCHistory just wasn’t quite as flexible for that.

16:59

It’s just built for a time where, like Wayne said, you have the IT network and the OT network, and not a lot in between, but store is pretty flexible.

17:10

Let’s talk a little bit about hardware and infrastructure recommendations.

17:15

I always joke with customers more, but there obviously is not an infinite resource on hardware and infrastructure, so let’s talk a little bit about what you’re going to need for historian implementation.

17:30

The first thing I’d like to talk about is storage.

17:35

Storage is cheap, relatively speaking, and gets cheaper every year.

17:41

Historians need to have tools to help manage storage.

17:46

So you can have expensive storage and cheaper storage.

17:51

So you need to be able to tier that.

17:54

For example, one of our current recommendations is that we allow solid-state devices for near-now data, let’s call it the last year.

18:06

That provides that data, that data is more active, it’s being used more often by consumers, it has all sorts of calculations being done against that data before it’s permanently Anything we could do to speed up that near-now data pays off dividends and usability.

18:30

Slower bulk storage can be used for older data that does not need that near-real-term availability.

18:38

If somebody is looking back at data that’s five years old, it may take an extra five seconds to retrieve, but that’s okay for that kind of analysis.

18:49

Industrial Heim series data, this data is really no different than any other enterprise data.

18:55

It’s just based on a storage device.

18:58

So any tools a historian has to help you manage how you store data in different locations, that stuff could be very useful and you can maximize whatever resources you have inside your facilities or inside your enterprise.

19:16

Networking.

19:17

Again, networking in plants constantly gets better but constantly gets used more and more as more and more data intensive applications go all the way to the production floor.

19:33

So, you’re always going to be in a struggle with other uses of that kind of infrastructure.

19:40

For enterprise networking, we have corporate users of data that might not be on site and they are they are accessing the data off-premises and we also have on-premises data and we require the proper bandwidth.

19:54

On-prem, usually your mill network infrastructure is fine.

20:00

Off-premises data access, we tend to use the rule of thumb of greater than 100 and less than 100.

20:06

greater than 100 megabits of bandwidth, less than 100 milliseconds of data latency.

20:12

This provides an acceptable user experience when someone is trying to access the data.

20:20

Compute and memory.

20:22

As collection and usage grow, monitor your services and add memory and CPUs as needed.

20:30

So when a data historian goes into a plant that’s never had one before, or a data historian, as these new technologies are taken over our thoughts of how we analyze and use data, our data collection tends to just grow.

20:51

You’re going to just need to keep an eye on what’s being used today and what’s being used tomorrow.

21:01

As memory and CPU are needed, Do you need to find a way to add it to your system?

21:07

Maintenance of future-proofing.

21:12

Make infrastructure adequate to accomplish historization goals, how much data you want to collect.

21:18

Keep an eye on longer-term goals when you’re setting up that infrastructure.

21:21

It’s not just today’s goals.

21:23

Like I said, data collection and data use tend to grow over time.

21:29

Plan for flexibility more than getting it right the first time.

21:33

be prepared for unanticipated needs.

21:35

I think the explosion of AI in industry is something that not many people anticipated.

21:46

Just be aware that you might need to be flexible to meet tomorrow’s data needs.

21:54

To talk through something of a real-world example, it’s not the architecture diagram.

22:02

From that customer, this is more like a sample, but these are stats from an actual customer of ours.

22:09

They have 700 ,000 tags connected in-store.

22:14

This is a site where it’s more of a centralized historian.

22:19

You could call it corporate, but I think that’s not quite right because again, anyone can use it.

22:23

But it’s taking data from multiple different manufacturing facilities and putting it into one spot, which partly explains the larger than normal tag count, but again, it shows the store can handle.

22:35

I think it’s a great solution for a couple of thousand tags.

22:38

In this case, it works well for several 100 ,000 tags as well.

22:43

It can handle that level of load for, you could say for writes and then from the retrieval side of things, there’s as many as 650 users for the system.

22:54

It can take quite a bit of load as well without bogging down.

22:58

Although again, for Wayne, I think Wayne will speak a little more to it here in a bit.

23:03

But it depends a lot on your hardware.

23:07

But in this case, you can see that they had quite a bit of RAM and they have quite a bit of storage to store all that data.

23:19

But again, storage is getting cheaper as time goes on, so it’s usually not the pain point it was a few decades ago.

23:27

The next thing Nick and I would like to discuss is sort of high availability and uptime.

23:34

So we all have different needs for our data.

23:38

We might have environmental data that we have actually federal regulations on how much data we’re allowed to miss during a year, safety monitoring data, and process data.

23:52

So we have several different redundancy strategies and some of the things you need to think about when you’re thinking about redundancy is we can have collection redundancy where we have multiple collection connections to process data and that would provide better uptime if connection into the OT system is weak and we can do storm forward, we could do the two systems are aware of each other and if one goes down, the other one picks up and we lose very little processed data.

24:27

We can also do server redundancy and we can build a complete data historian path with multiple data historians and multiple collectors and critical data access becomes much more robust if you add that much parallel path for all the collection to it.

24:46

Achieve an uptime at 99.9% plus, We have buffering at the collector later.

24:54

The data will move into the historians once it becomes available.

25:00

If you do have, it will backfill and reconnect if you lose connection to your collector.

25:07

Also, if your OT infrastructure has its own historian, has data storage as well, we could configure backfill from that data.

25:17

So if our collection fails and their collection stayed online, we can use that data to reconstitute what we missed.

25:27

And always, of course, have a plan.

25:31

Documented disaster recovery procedures and drills.

25:35

I’ve never been a big fan of drills, but I do like to write things down so if something goes wrong, I know what I’m going to do to fix it.

25:44

And that might reveal a weakness to where you actually have physical hardware that’s staged and ready to replace if you have a physical hardware failure.

25:55

But that kind of exercise of going through failure scenarios can help you minimize whatever data loss you might have.

26:06

Some failure scenarios to plan for.

26:09

Server hardware failure is an obvious big one.

26:11

In our today’s world of virtual systems, that is pretty rare, but it can happen.

26:21

If you build a high availability system, you need to be aware of where your single points of failure might be and eliminate them if at all possible.

26:29

Network outages and partitions.

26:32

Obviously, a network outage where the data can’t flow can cause issues.

26:37

Having a collector that does store and forward can make that data recoverable because the data is as close to the data collection point as possible, so the least likely for a network disruption to disrupt the actual data collection.

26:56

And once the network comes back online, the collector can forward the data.

27:03

Also, high availability systems can help you plan for maintenance windows and patching, especially up in the business network tiers to maintain your security, you need to do regular Windows updates and patching, and you need to have a plan for that.

27:24

If the data needs to be available, you might set up a full high availability network to make sure that that data is still served to your customers while server one is being patched, server two is carrying the load and vice versa.

27:42

Yeah.

27:42

To talk a little bit more to the high-availability stuff but with a pretty overly complicated picture instead.

27:51

Just like Wayne said, this is the same diagram effectively as I showed earlier, showing how data part slots into it, but with something of a full redundancy setup like Wayne described here.

28:04

In a lot of cases, this is essentially just a copy-paste, Whereas before we had L3 had one data part collect instance collecting data from whatever is available on the control side.

28:15

In this case, we have two data part collect instances.

28:18

So this is the collector redundancy that Wayne was talking about.

28:22

In this case, I’ve drawn multiple OPC servers.

28:25

And in this case, I’m showing both of these collecting from both OPC server or MQTT or whatever underlying system is.

28:34

I’m showing both collect instances drawing from both OPC servers, so this full redundancy in that sense where if one of the collectors were to fail, and it is still the case that many of the collectors are physical hardware and failures happen, the other one would pick up without any data loss.

28:55

Everything is a trade-off with the HA type stuff with the redundancy.

29:00

I would say it’s more work to set them up.

29:05

Mostly the big trade-off with redundant collectors, if they’re both collecting from these systems, is whether those underlying systems handle having both connections active at the same time.

29:15

Some systems have a cache and then they’re serving up essentially the same thing to both systems without any extra load and it’s perfectly fine.

29:23

Sometimes though they don’t do well with that extra load and the setup isn’t quite as feasible.

29:28

but this one again allows for a lot more flexibility and especially for if things are unreliable, you’re much less likely to lose data and data loss here is usually permanent unless like Wayne mentioned, there’s some service like a process side historian that has a backup of the data.

29:48

In the DMZ, again, it’s basically a copy paste.

29:51

In this case, I’m showing multiple historian, multiple instances of store running on different machines.

29:58

In this case, they will both have, to some extent, a entire copy of the history archive.

30:06

Wayne had mentioned that you can configure maintenance jobs and things like that, and so there are trade-offs where depending on what your business requirements are, you could store more or less, for instance, in the secondary, you could have less data than the primary, if that made sense, or if you’re really confident that you’re backing up the primary well and secondary just needs to serve your users in a temporary window where the primary is being patched like Wayne said or something like that.

30:36

Then again, I didn’t draw it on this one.

30:39

Some people do have redundant application servers for PARCview but usually you only have one and our UA servers are already set up for failover between store instances, so that is fairly seamless and then a few clients failover to different UA servers.

30:56

If you do have multiple application servers and the one they connect to becomes unavailable, they can start flipping through a list of pooled UA servers to try until they find one that connects.

31:07

This means that eventually data, wherever it’s flowing, whether it’s to park your clients or whether it’s flowing out to some increasingly critical application in the Cloud or whatnot, from some offsite set up via one of our APIs, then it basically ensures that your data is flowing.

31:29

And so a lot of it is coming to terms with what you need for uptime.

31:35

And, but this is a set up where if you do need that 99%, then this will basically get you there.

31:43

You may need less, depending on what your requirements are, if one of your customers, if you don’t, if a little bit of downtime and while your batch isn’t a big deal, then maybe you don’t need this much, but this is a way that you can get there if you really need to treat it like critical infrastructure.

32:00

Let’s talk a little bit about using PARCview as a client for your data.

32:03

Let’s assume that we have our data historian set up and it’s built with the Purdue style segmentation.

32:09

So we have the data historian available to provide data to PARCview as our client.

32:18

And we want that data to be mostly accessible to people, but we may have needs to where some of the data that we’re providing, whether it be environmental data, cost data, that kind of data that we just don’t need everybody in the world to be able to see.

32:38

So we do need to be able to restrict data access.

32:44

So PARCview has a PARCsecurity infrastructure, PARCsecurity application where you can configure data access that you need to configure for your facility.

32:55

If you need to have different tiers of data access for operators, engineers, or suppliers, for example, you want suppliers to be able to access certain amounts of data for what they manage for you, but not access general plan information, especially if you have some costing information on your system.

33:14

You can use our security applications to restrict that data through our APIs, through our UA server access, and only provide the data to the user that is appropriate for their context.

33:33

Kind of dovetailing in with the last slide here, exposing data to third-party applications where obviously AI and general machine learning applications are the heavy hitter at the moment.

33:51

We’ve continually attempted to expand out what options are available to customers to get some of this data, so who knows how many more things there’ll be in a few years.

34:04

But at the moment, we expose a bunch of these things.

34:08

We have a REST API, we have our own DA server, we have our own UA server, we have a historian SDK.

34:15

There are low-level GRPC calls, if you really need maximum performance.

34:22

We have a lot of options as far as the protocols in which you can get data.

34:30

But as far as how you would do that safely, it really depends on where you’re attempting.

34:37

Well, it depends on who is initiating the request, whether we’re pushing data or whether someone is asking for data, whether someone else is initiating the request.

34:50

If you’re initiating the request from the enterprise zone, then it’s pretty simple since it’s going to be similar to how a Part-V client or a UA server would request the data.

35:01

It’s probably going to reach into the DMZ, so you just have to expose generally like one firewall port, and then it will be encrypted if on almost any pathway that you might request.

35:13

Unless again, there’s only so much you can do with OVC-DA since it’s a legacy protocol.

35:20

Again, almost everything is going to use, with the exception of OVC-DA, everything’s going to be authenticated and encrypted, so it’s quite secure.

35:31

Then as Wayne mentioned the other slide, there’s going to be tag-based permissions, so you can control what people get.

35:39

As far as what you can connect, again, people are increasingly trying to funnel this data up to AI platforms and AI agents and things like that.

35:50

A lot of that is bulk pushes.

35:54

It depends on what the application is, but a lot of times they want nearly real-time data and they want quite a bit of it.

36:01

It’s a matter of determining what that cadence looks like and how much you’re going to push.

36:09

But again, things like the GRPC calls, the SDK, they do well for that sort of thing.

36:17

Again, for Cloud services, a lot of times the Cloud services are really just there to facilitate things like AI where you’re storing it in a data lake or something that makes it more easily accessible to the other Cloud platforms.

36:33

But sometimes it’s also to power things like the third one, like Power BI or any other analytics tools that might be that are a little bit harder to get, you know, to talk directly to say the DMZ or something.

36:46

So, you know, again, it’s a sort of expansion of the Purdue model in that you are attempting to kind of add more layers, right?

36:55

More network segmentation.

36:56

And sometimes it does mean that you wind up with kind of like duplicated data, but that means that, you know, if things aren’t kind of actively requesting that data all the time from within the mill network, then it is a little bit more secure as well.

37:10

So often that trade-off is worth it.

37:15

So let’s talk about just a couple real-world AI and integration examples.

37:21

Common use cases backfill data for missing periods.

37:24

Again, I talked about in the previous slide that your low-level control system may have short-term historized data and anything we could do to integrate that.

37:36

We do have several pathways where we that capability and that adds an extra layer of data collection protection to the system.

37:45

Uploading data to the cloud, many times you may store the data on premises, but you have cloud analysis services or other cloud resources that need access to your data, and we do provide many ways to move that data up, as Nick was talking about.

38:05

Pulling external sources of data, more and more data is becoming available from outside in the world.

38:14

Weather stations, if you do have weather stations, this may be important to you to monitor the river level for effluent discharge on your plant.

38:25

If you have access to the actual weather service information about that, that can be very convenient.

38:35

Commodity prices, real-time energy pricing is becoming a method of trying to minimize the amount of energy cost in a plant, and that is external data.

38:51

If you can access it in real time, you could do your analysis in real time and make decisions about how you run your process.

38:59

Writing from external sources.

39:01

you may have ERP systems, and you want to provide that data down at the historian level so that it’s accessible by operations, much easier than having to access a business system.

39:15

We can do those kind of integrations and get that data to who needs it without that person needing a lot of expertise to get to the data.

39:28

if we can provide it in a user interface or used to working with in real time, then they can make real time decisions with that data.

39:37

And it doesn’t get isolated from the people that need to make decisions with that data.

39:45

So what does this enable?

39:47

We can train machine learning models on historian and external data.

39:54

We can integrate all that data for our machine learning models.

40:00

We can do AI-driven anomaly detection and optimization if we’re monitoring every bit of data in the process.

40:06

Modeling can reveal things that are going on in the process that might not be obvious to an operator or to supervision and maintenance.

40:22

We can operationalize model results back to the floor.

40:25

We can use this kind of data to make decisions about how we run our equipment to optimize whether cost, productivity, or quality.

40:37

And we can bridge OT data to enterprise systems.

40:41

So we can take that data that’s from the floor and we can get that data all the way to enterprise systems.

40:48

So an example I like to use is environmental management.

40:53

If I can see everything about my environmental monitoring equipment at the floor level and get that all the way to the corporate level that makes compliance much easier, that you can actually see the data and provide it to your compliance agencies in a secure and reliable way.

41:17

So this is a slide on governance.

41:21

And I think it’s worth calling out that since we’re talking about here is not on say like access control, which Wayne already discussed, but more on what we kind of like what we’re going for and what we sort of do or don’t allow.

41:40

And I think the first thing that’s worth mentioning is we’re a company that came from engineers and is still pretty largely comprised of engineers.

41:50

And we often wind up in control rooms, and we’re using our own software.

41:55

And so a lot of it is built to serve our own needs.

41:59

And so a big one there is speed, because no one wants to wait for data to load if they can help it.

42:07

Obviously, sometimes there’s things that are hardware limitations and whatnot that make that unavoidable.

42:15

But as much as possible, we don’t want to be the bottleneck for getting you the data, right?

42:20

We’re hoping that, again, given reasonable hardware, that we can serve up that data in a way that’s very snappy, that makes it easy and fun to use it and to make good decisions in real time.

42:36

And then as far as like ownership goes, we just generally don’t have any restrictions on, It’s your data and you can do with it what you want, whether that’s replicating it to some other system or any crazy thing you can think of.

42:52

We don’t gate it in any sense as far as placing restrictions on how the data doesn’t expire or anything like that.

43:01

If you’ve historized it, it’s yours.

43:05

It’s a little harder to use it without some of our software, but it doesn’t mean you don’t own it any less than any of the other things, you know, at the site.

43:15

And then last, again, we kind of had already talked about this a little bit, but we shoot for a lot of flexibility as far as how you can access the data.

43:23

And we generally don’t do things in a sort of a modular licensing standpoint.

43:28

Almost all of our stuff, if you license the Historia and it’s just available to you, you know, out of the box, if you want to use it.

43:37

Great, thank you, Wayne and Nick.

43:39

And it does look like we have a couple of questions, so I can go through those with you guys now.

43:46

One question is, what is the biggest mistake you see when configuring the data historian?

43:53

All right.

43:54

One thing that we’ve had customers do in the past that we had to adapt to and sometimes we had to change was putting the historian too low in their reference architecture, putting the historian at level three.

44:08

We do have some tools that help you us adapt to that, but having the historian in your OT network is usually limiting, and not having a growth plan.

44:25

Of course, plans go through capital cycles when they’re working on their IT infrastructure and their OT infrastructure, and we need to be aware of that.

44:33

We are going to become a major, pretty major user of IT resources, and that needs to be planned for in the future.

44:45

We need to be added to the long-term plan.

44:53

OK, well, thank you.

44:54

Let’s go to the next one.

44:57

How do you recommend manufacturers approach the Purdue model when their OT and IT networks are already heavily integrated?

45:08

So, background we didn’t discuss at the beginning of this meeting, I actually have 20 years in the process industry and went through this very problem.

45:18

And you just have to decide that you’re going to increase your cybersecurity structure and get things back to get things to a more secure infrastructure.

45:36

Investigate first always.

45:39

If it was easy to move data between multiple levels in your network, between business systems and control systems, somebody did it.

45:47

Just accept the fact that it’s going to be there.

45:50

Go find the structure.

45:52

And there’s plenty of tools for that, where you can sniff what’s, traffic’s going on the network.

45:56

Find it and start migrating to a security model, whether you, it may not be the Purdue model right away, You may use VLANs as a stopgap measure, a firewall as a stopgap measure to just start closing things up.

46:10

I like to think of it in myself, sort of like in the OSHA hierarchy of controls.

46:16

OSHA has a hierarchy of controls for safety systems and eliminate break connections.

46:22

If the connection doesn’t need to be there, if it doesn’t fit some reference architecture that you are managing, break it and get it to a managed connection.

46:34

Engineering controls, that’s firewalls.

46:37

In safety, we talk about engineering controls, whether we put guardrails or something like that up, that’s what a firewall is.

46:44

Administrative controls, change management.

46:47

Once you start migrating to where you have a managed and planned network segmentation, make sure that any changes to that, you go through a change management structure so that you can maintain that security you will spend a lot of time putting back into a system you didn’t have security in before.

47:11

Okay, thank you.

47:12

I think we have time for one more.

47:16

How do you know when your historian infrastructure needs to be upgraded?

47:20

What are some of the warning signs?

47:24

Yeah, there’s generally gonna be a couple and some of it, whether it’s a full upgrade to an entirely different system, or if you just need to do some, I think people do sometimes neglect the maintenance on your historian adjusting things like data collection rates or removing dead tags and things like that, maintaining your backups and your disks and whatnot that can actually make a big impact on your day-to-day operations.

47:55

But assuming that all the busy work maintenance tasks are properly being taken care of already, I think the big ones is your hardware starting to age out.

48:12

The really big one is, of course, if you just can’t patch a server anymore, if it’s an operating system that’s too old for, whether if it’s a Windows machine, if it’s too old for Windows updates, That’s a really big warning sign in our current environment that probably it needs a massive refresh.

48:31

Hardware-wise, again, if you’re constantly running at a really high 90-95 percent CPU and or memory load, then obviously your hardware probably just can’t keep up with at least the way you’re currently using the system.

48:47

Again, everything is pointing towards heavier use in the future, not lighter.

48:52

So, you know, that seems, again, it would, it’s probably a time to, you know, it may be worth the capex to try to upgrade that to something, you know, better, that is more future-proof for what Wayne had discussed earlier in the presentation.

49:06

You know, and then last, you know, this one can, this one is a little bit harder, of course, because it’s more, this is not so analytical, I guess, but if the data just feels, you know, when you’re using your applications, if it feels sluggish, if you’re not getting it, you know, in a fast enough timeframe to make decisions.

49:24

So again, if you’re trying to alarm on something like downtime for a machine, and if you’re not seeing that for minutes or even longer after the event has happened, it’s a question of, could you do better if you had it faster and more responsive and whatnot?

49:45

And that I think is also a good sign.

49:46

Or if the user experience just isn’t good, that again maybe it’s time to either like to give it all a once over and see if the existing system can be improved in some fashion or if it is time to just try something new you know or upgrade you know for instance if you have our if you have PARC history you know it’s not a bad time the the upgrade is an upgrade to store it’s not a one-to-one by any sense of the word but you know store addresses certain you know issues that we’ve had with PARChistory especially as as like uploading large quantities of data so again if there’s if the if the current historian just isn’t matching some of these requirements it’s always a good sign that it might be time to to upgrade to the to a newer system okay well i think that’s all the time we have um if we didn’t get to your question or if you think of one later um we’ll reach out to you via email or we may post the questions with answers underneath this webinar when we post it on our or YouTube. So once again, thank you very much for joining us today and see you all next time.

Building The Smart Factory

A Guide to Technology and Software in Manufacturing for a Data-Drive Plant