DevOps and Resiliency in HealthTech Panel Event Recap

Joe Ferris and Victoria Guido

Thank you to the organizers and community of DevOpsDC for putting together this event. Here’s a summary of our conversation - you can view the video and find the panelist information on the event page.

How can DevOps help meet regulatory compliance and security needs in HealthTech?

David Greene: I mean, from where I come from DevOps is the heartbeat of security and compliance, it gets what automates the things. And if we can’t automate it, we’re going to fall behind in the space. And so we’re not the only players, of course, but if, especially if we’re doing DevSecOps together, you know, the, this is a place that DevOps really belongs.

Lauren Maffeo: I’ll add that when we talk about data and governance, that’s a term that seems pretty nebulous, and it can seem almost theoretical to theoretical to implement into real world situations, especially organizations with very complex, large amounts of data, like the health care sector. And the way I always put data governance to people is to think of it in three words, automate your standards, that you need data quality standards, and compliance standards, all of that is part of governance, and it’s work that you’re probably already doing. But if you do not find a way to integrate those standards into your development and production, lifecycles, there is not much purpose to them, in my opinion. And so a lot of not only creating your standards, but automating them is essential. And it’s especially crucial in a area like health care.

How can we move away from the spreadsheet driving compliance exercise to automation in this space?

David Greene: Yeah, I mean, one of the things that I’m looking at, you know, Victoria, you and I were chatting today and talking about OSCAL as a possibility in the future. Today, it’s not ready, right. But the idea that by using data formats by using standards of how we break down, you know, how security scans are coming, or what our deployment status is, of various things, whether our controls are in place, that can really, you know, take us away from so much manual work to, to automating the compliance exercises that especially in healthcare, we have so many regulations to work with. Yeah, I

Jake Selby: I think in DevOps, the automation of validations, and security checks is kind of a key component of having a full CI CD lifecycle and having true continuous deployment. And so I mean, anytime you’re trying to, you know, automate software delivery, it’s important, I think, to piggyback on the answers of others, in the healthcare domain, it’s extra important, because because the stakes are so high, you know, it definitely for us, it characterizes working in a healthcare data analytics company, I mean, the security of the data, and making sure that we’re only providing the data to those who are authorized to is of paramount importance. And, you know, when it comes to HIPAA, and the potential for nfines and reputational hit, etc, that comes along with any type of breach. I mean, you know, I can’t think of a better use case for for automation of security and government.

Victoria Guido: Absolutely. And for the long time members of DevOps DC, of course, we all know security is important. But I’m sure Joe, in particular, has experience working with founders who may not have a technology background, and how do you really convince them that security is a thing in a way that makes sense and has makes them prioritize it when they’re building products from the beginning?

Joe Ferris: Yeah, I think it’s really hard for it to become real to people until they’ve had their first close call. Like the way I explained security to founders, is it just kind of like fire safety, that it really doesn’t seem that important until something goes on fire, and then it’s all you care about. And having your whole dataset exfiltrated, which is a real threat for a lot of people, is that’s the end of the company for a lot of companies. I think automated governance is really exciting. But I also think, for most organizations, particularly early founders, there’s a lot of low hanging fruit. Like most people get far enough that they have, you know, a corporate bank account, they probably have an AWS account, but then they don’t know what to do from there. And the really simple things like making sure that you have single sign on making sure that you have a documented onboarding, off boarding procedure and then just disabling most of the features errors in your AWS organization, for example, can save you from a lot of potential headaches.

The design challenge of data governance

Victoria Guido: Right? And, you know, we also talked about like setting up the governance and that process around really even understanding what data you have Lauren, do you have any insights to share with how a group could set something up that can sound very like overhead, and like a lot of overhead and work and process and make it easier and actually something that’s accomplishable?

Lauren Maffeo: I do have a few tips that I talked about in the in the book, one of them is starting with a data framework to use to inform all of all of these efforts we’re talking about. So of course, security, and compliance is huge, you know, transparency, and trust is huge ethics is huge. And I titled the book designing data governance, because I mean, so I am a service designer, and I do believe that data governance is a design challenge to be solved. Because if you design systems, services, products with governance in mind, it becomes much easier to automate those governance standards that you set rather than having to go in and retro actively do it. I think those of us who have worked in very bureaucratic organizations, whether it’s healthcare or the government are very familiar with the high amounts of technical debt that exist in these organizations, and retro actively going in and applying DevOps, it certainly can be done, it is done every day. And it’s incredibly painful for the people who have to do it. So if you’re starting from scratch, I think there’s a great opportunity to find a framework that already exists. I used to work for Gartner. And they have a seven step data framework that I think is really valuable because it encompasses everything from education and training to collaboration and security. So finding a framework to start with is really important. I think it also is really essential to define what your key data domains are across the organization. So where what are the key groups of data that you have and who if anybody owns them today. And then the third thing to do in the first incidence is to map those data domains and the use of that data and those domains back to your business strategy. Any data driven effort really should impact the business and the organization by increasing its key goals. And I’m constantly amazed by how infrequently senior leaders can articulate those goals. If you ask a lot of them, why are you doing this migration? Why are you building this model? The answer is, well, our Oracle database is is coming up on end of life or we need there’s there’s either some compliance or end of life reason. And sure those are valid, but that’s not enough. So I think those three things, they sound, maybe simple, but that’s a lot of work in and of itself. But it’s worth the work that is worth investing in.

Victoria Guido: That makes sense and maybe helpful to us, like a specific example around, you know, the API sending healthcare data back to ACOs. Maybe, David, that was a question or a story you can tell us about, you know, how do you manage that data? Or what are the security concerns around that kind of data? And how do you automate some of that and figure out who the right people are that need to be involved?

David Greene: Yeah, I’ve worked on a couple of different projects at CMS that we’re, we’re sending, you know, as claims data to caregivers at various levels. And it depends a lot on how often these relationships change, who is the person that is, you know, who’s my doctor? Well, I might have five of them. And maybe, you know, if I’m a snowbird, I have some attendance in different states, and maybe I may switch doctors, so I don’t want the old doctor to be getting all my stuff. So depending on how well defined these relationships are, it can be a lot easier to deal with that. In the best case, where healthcare information was being sent to ACOs, in my experience, even there, there’s an awkward moment at the beginning of the end of the year, where data has to stop flowing. It’s just not a great experience for the ACOs, the doctors that are depending on that data. It’s just a side effect of the way that the models are put together. But it’s even harder in the case of picking direct providers. Is Dr. Smith my doctor, how can we tell? Well, Dr. Smith can say, I promise, or we can have you go in and say yes, this is true, this is my doctor, but at some point, usability really fully falls down. And so the more that we can honor Meet. And this this falls in the DevOps realm, you know, in terms of building systems, maintaining systems having, making sure that they are at can scale properly, and that we’re not spending more more money on compliance, that we aren’t raising risks. Bye. Bye, getting the data out there more than we are solving actual health risks by by getting that to two providers. So it’s a very interesting space. And I would like to see, you know, we have we have healthcare exchanges in the, in the private field that deal with similar kinds of questions. And I see this as an early phase, you know, we have such a complicated healthcare system in the US be nice in 510 years to have incrementally solved some of these issues that we have with with sharing data, so that we really can get better care.

Jake Selby: Yeah, well, just to dovetail on the last point you made, I agree that we’re kind of in a similar place the financial industry was in prior to standardizing how how they transmit the financial data between the different institutions. And we’re just right now we have all these different players in the game. So wholeheartedly agree on standardization. I know that’s something the industry has been trying to push towards, in terms of, it’s interesting to hear your perspective, from the CMS side of producing, producing the data and pushing out the API. You know, as a consumer of a lot of CMS data on behalf of our clients, we have an interesting use case where we want to build a bunch of analytics on top of that, and then provide that back out the API publicly accessible API in certain circumstances. And so it’s a whole, I’m sure, there’s a lot of similar concerns, and worries, but you know, it, it brought along questions. You know, sticking on the theme of security, it brought some questions that I’ve never had had to ask, in terms of who’s allowed to, you know, some pretty basic, like, code base management questions, who should be allowed to hit the Merge button on a pull request, when that that pull request could potentially open up a major security vulnerability? Right. And so what kind of what kind of governance do I have on my codebase? How many approvals do I need? What kind of validations? Do I need to to happen in my CCC, ICD workflow before we promote this to production, and, and, you know, in, for us, that that were most of our API’s to date have been non VHi, aggregated data, things like that. It’s a completely different ballgame, and a different series of questions we were answering, and I don’t think we have all the answers. But we were doing our best.

Victoria Guido: I know, Joe, you’ve been working on creating more standardized approach to how we build brand new software and solutions. Do you have some thoughts on on how that’s going? And where you see the industry kind of trying to pick up some new standards?

Joe Ferris: Yeah, I mean, we, we spent a lot of time deploying to platform as a service offerings like Heroku, when we’ve done some deployments to app to ball. But what we found is that when you start to go through stricter compliance frameworks, or you have people that have specific questions about you, and how you handle data, you really need to know the details like what are the cryptographic standards that are being used? Do we have unique keys for these different data sets? The some of the providers can can give you but it’s, it’s a journey. But then if you go the opposite route, you decide to own your own infrastructure deploying to AWS, it’s a complete blank sheet of paper, with many, many wrong answers. And for most founders, a lot of those decisions are not special to their use case. Like a lot of them don’t even realize that they’re going to have interesting data, they end up with data as a side effect of whatever they were there to do, like facilitate administering vaccines, but then you end up with vaccination record by law. And now suddenly, you have all these data concerns. And so we’ve tried to do is answer a lot of the out of the box questions you have with decent, sane defaults and build modules that we mostly use TerraForm that follow some of those best practices and enable the security features by default that a lot of platforms, even AWS, by default are off. And I think it’s been going you got to have it’s been going I think it’s been going well, in terms of building out the modules, it’s been difficult in terms of adoption. There’s a problem I’ve seen in DevOps that I’ve seen in other communities where everybody’s inventing their own thing. And so like everywhere you go, you have to learn what their special approach is to how they like everybody has their private modules. So when something thing to do is to make ours public and see how much engagement we got. And we have a few companies using them now, but it is an interesting challenge.

Lauren Maffeo; I think that’s a great point Joe, especially when you think about internet connected devices in healthcare that generate that data. Because as you said, you can, that’s how you inadvertently get non technical data stewards, you get people who are overseeing these devices in one way or another that are generating this data. And like it or not, that does become a compliance and security challenge that you have to address. And then it becomes a question of who’s the best person to own and make decisions about that data. I think that’s actually a great opportunity to leverage non technical employees as possible data stewards throughout an organization. It doesn’t mean that those people necessarily become DevOps engineers or that they are, but it can mean I think that they have an active hand in defining the data and quality standards for both that data and metadata that they are that the products they oversee are generating. This can get very granular depending on what you’re talking about. But I do think it’s a really important point, everybody talks about the IoT, Internet connected devices, but those devices generate data, and then the question becomes, who owns it, who cleans it? who oversees it? When it goes through pipelines? Those are all decisions that have to be made. And I think left unchecked. More data is not always better data. And again, I think we I think everybody here knows that. But it’s worth saying, because I often see the volume of data and big data being framed as a positive or as an end goal in and of itself. And it’s it’s very much not, especially in a field like healthcare, it actually can become a liability.

Security, terraform and threat modeling

Victoria Guido: Right, so it sounds like if you combine those two ideas, like if you have a standardized approach for your compliance, and your security in how you’re building your infrastructure, than someone who maybe isn’t the most advanced engineer, can still own the data and know like, where it goes and how it’s being handled and where it’s being stored. And, and on TerraForm. Specifically, I wanted to ask David a little bit about Threat Modeling. I know he’s been working on that space with his team.

David Greene: Yeah, one of my co-workers had presented recently on sometimes in the DevOps space, we’re not just implementing the ideas of others but we’re also responsible for some security compliance, or we’re driving some development. And in this casenthey needed to do some threat modeling on the team. And as a DevOps engineer reaching for HCl, which is the language that TerraForm uses, was a great opportunity to describe the kinds of things that we’re trying to, you know, what our assets, what are the risks against them. And there’s this, this little tool called HCL TM, that they were able to use to, to walk through, and it really spoke, you know, to the team of DevOps people say, Hey, this is our language, it’s walking us through the things we need to do for compliance. And then it has all of the hooks that you would want in a DevOps tool to be able to programmatically do things with that data, or come back and review that. Anyway, so that’s another one of those things where there’s a lot more potential out there than then there is built out for it. But there are a lot of intersects, I think, between between DevOps and security, and it depends on what your team is doing, and how you’re going to be able to take advantage of that.

Victoria Guido: Right. That makes sense. And now since I haven’t heard from Jake in a while, what things are you doing at CareJourney to shift things left in security and, and put those automations in early before your team pushes code?

Jake Selby: A big theme of this year is shifting left across all boards. So you know, what we’ve done is we’ve sat down and kind of designed out what our ideal software delivery system is, and kind of where we can push as much to the developers as possible. And so some of the things that we’re actively working on right now are pre commit hooks in our workflow that will notify the developers before they’re even pushing anything up id anything they’re doing is breaking policies. We’re also pushing to move to a fully artifact driven CI/CD system. Some of our flows are some of them aren’t, but getting us as far to just generating an artifact and even having the personal developers workflows generated during that artifact that’s importable. That’s another area that we’re working on, and then also, just kind of getting a lot more of the checks earlier in the workflow, so, obviously, you still have to hold your end to end synthetic tests. But as much as much that you can push more granular checks as far earlier in the pipeline as possible, that’s one of the big things that we’ve been working for, or working on with the devs. And trying to get their workflows going, um, some of the other stuff that we are doing, in order to shift left, one of the big themes I’ve been trying to push is having the developer start really owning their own CI/CD workflows to date, there’s been a lot of question to is it the DevOps engineering team, who are partially integrated with the developers but not fully, is it their responsibility to make changes to people’s workflows, their pipelines, etc. And so the more we can give ownership to the developers, the whole point of DevOps as a cultural mindset is that it’s not a any one person’s responsibility to push the mentality of collaboration and automation that DevOps embodies. So one of the big things we’re trying to make sure is there’s a full team effort. And just because your title might be full stack engineer or data engineer, it doesn’t mean that you aren’t responsible for ensuring that the projects that you collaborate on, you aren’t pushing best practices and automating tests and validating all this stuff. And really being responsible for that workflow. And in response for answering questions like, Why did my test break, right, and that kind of thing. So putting more onus and responsibility on the developer, to really have a full embrace of ownership of the project end to end, not just I made my commits, I made a PR. Now it’s done. Oh, I see. There’s a little checkbox and like, it’s green, I’m good to go. And to people who really understand what the how their software delivery system works, what the checks that they do, putting ownership of formulating those checks, and validations and the synthetic validations, onto the developers to write. So the more we can create these fully functional teams that that own end to end, and are getting this feedback as early as possible, the more we can, as we say, shift left, and really let these agile teams for those of us who do Scrum and Agile, agile teams to actually truly be agile and, and develop in and push things out autonomously.

Victoria Guido: That makes sense, and with some confidence that they’re in compliance at the same time, right. I’ll go back around the other direction and see, David, if you have anything you want to add to that.

David Greene: Yeah, I mean, one of the things that we’re looking at right now, and my current team is use of machine learning to mine data, my logs to correlate, Hey, what is causing things to come down, it’s a very DevOps thing. But it fits into the, into the realm of AI. And also on the compliance side, you know, tossing around ideas that our company about, you know, there’s so many documents that we have to generate, and so much of them, you know, chat GPT writes better than a lot of the security Doc’s I’ve seen, is there a way to integrate those kinds of tools, in a way that retains privacy in a way that doesn’t insert the kind of risks that AI brings about? Where AI can be sure, maybe about a thing that’s kind of wrong, whereas a human would be starting to get unsure as they get closer to that boundary. There might be some room for these things. And so I’m looking forward to to seeing my first AI generated security doc, to having to averting our first incident because we were able to see into the logs with machine learning.

Victoria Guido: That sounds super cool. And I’ll say, Lauren, do you want to add to that, or maybe even talking about how to shift the data governance a little bit left and bringing in earlier into your CI CD process?

Lauren Maffeo: Yeah, I think that’s a great point. For me starts with giving your development team that set of guidelines and standards early so that they are able to ideally have input into them. I think even the precursor to that is having a data governance council across your organization where each data domain which you’ve predefined has representation on that council, it again does not mean that everybody is necessarily a developer, as a data scientist or an engineer. But you do have people on a shared counsel to set standards and define what it means for instance, to add a new term to your data catalog what it means to review your pipelines if your organization is large enough, you could have subcommittees for instance, one to review all possible tools. And the earlier you get those processes integrated into your culture. I think the easier it is to then to take care of the tech because you’ve designed those processes and systems with data governance in mind. And you’ve also given everyone more transparency into how it works for your unique organization. I also think there’s a great opportunity to include an embed designers into these projects. So if you have, for instance, a service designer, or a product or product manager, their work managing roadmaps is really helpful here, because they can make a roadmap which outlines the implementation process in the pipeline process with key tasks and milestones. And I think product management and managing these, like a roadmap is the most effective way to go. Because your data governance, especially in healthcare is not a project that has a conclusion, it really is a continuous lifecycle. And so you need really robust processes and standards to manage that lifecycle, because the work really never ends. And that can sound scary. But I think approaching it with a product mindset is a really helpful reframe, for data governance.

Victoria Guido: And it makes sense to me that thinking about what you’re saying was what David’s saying with if you’re going to be developing or using AI and like really even having more data that it’d be better to have people who have some ownership and can make some decisions about what’s ethical or what’s right. Are we following guidelines with all that as you go? And makes a lot of sense. Joe, do you have anything to add on my shifting security left?

Joe Ferris: I think in terms of getting everybody on board and engaged on the development team, the two things that I’ve found to be most helpful, sound kind of contradictory, but it’s to introduce decent guardrails, but then within the safe boundaries to make it very easy for people to collaborate on changes. And so for example, talking about CI CD, one of the things we’ve tried to do for our clients is have the infrastructure itself be managed by a CI CD. So rather than having people apply things and click ops, or even running TerraForm, locally, you can follow the same model that developers do on their everyday code and submit pull requests for the infrastructure. And that allows you to both set up some boundaries, like things need to be reviewed, somebody is going to look at the TerraForm plan before it can get merged, you can use things like protected branches. But it also makes it possible for people to make pretty widespread changes without ever having escalated privileges on their own account. And I found that if you make it a little safer, that way, if you try to enter, you try to increase the number of mistakes you have to make before you have broken something. It makes people more comfortable on participating in learning, and it creates more momentum on the team.

Lauren Maffeo: Yeah, I’ll jump in. And add to that quickly. In the chapter of my book on governance driven development, there is a case study throughout the chapter on the Netflix experience of moving to moving all of its data into brand new architecture that was very new and unproven, but it was the best fit for their needs at the time. And one of the things that the software engineering director did for his team was that he created a true space for folks to get more familiar with Kafka clusters, because that was a technology that his team was not very familiar with prior. And he knew that left unchecked, mistakes were bound to be made with very large amounts of customer data. And so he really did give his team a safe space to practice with clusters, knowing that they would make mistakes knowing that they would fail. But he created a controlled environment for them to get comfortable with the technology in a way that did not negatively affect the user experience. And as we all know, you never want your users to know that you’re doing a migration, if they do, it probably means that something went wrong. And the fact that he gave them that space to fail, I think is unfortunately very rare. But I saw that as a really awesome example of how to truly build a data driven culture that is, is DevOps centric. It’s giving people the freedom to get comfortable with the technology, they might not know yet.

Jake Selby: One thing that Joe said that I’d also like to add some thoughts on is really key to me when you’re speaking was was kind of the idea of giving more flexibility to the engineers to push infrastructure requests or changes, right. And so that’s one of the things that, on this topic of shifting left is, how do we how do we get more responsibility to the engineers and that’s something we’ve been struggling with in trying to come up with a good solution for. We today have all of our infrastructure managed in TerraForm, and there’s a discrete set of people who are able to actually make pull requests and actually apply changes, right. And the problem with that is, as you have more and more development teams, doing all this, creating all these things like I need an s3 bucket, I need a KMS key I need whatever right, the more of those you get, and then going over the wall to another team, you start having these these roadblocks and your process bottlenecks. Also, there’s just that knowledge gap that the developers don’t understand anything about the actual infrastructure that thing’s built upon, right? So we’ve been trying to push towards that the problem is you go from, I’ve got a single repo with, my TerraForm code, and I’ve got these different code bases. One of the things I know that I’ve seen, the industry is really starting to lean towards, and at AWS Reinvent, a lot of the speakers were pushing for this pretty strongly to start moving towards AWS CDK, because that’s certainly a toy, but it’s moving to taking some of the infrastructure code definition and moving it into the actual repository that houses application. And so that’s one area that we’re working on now is is, is finding some areas where we can say, alright, let’s just take this very application specific infrastructure code, grow it into the this applications code base, and then allow the developers to say, I’m gonna open a PR, asking for this bucket. And you still have the governance because you can use things like code owners files, and things of that nature and GitHub to enforce that certain people review changes to certain files, or even a whole codebase. Right, so we still are able to ensure that our team that manages the infrastructure, have the final say on whether a pull request is approved. But this allows the developers to get their hands on the infrastructure a little more and be able to move faster, while still maintaining the governance. So it’s kind of a best of both worlds and allows you to try to achieve more of the DevOps mindset of have that kind of distributed responsibility. Anyway, there’s just a little bit it. But you know, what, Joe, what you were saying about about developers, pushing code for infrastructure, this made me think about that.

Joe Ferris: Yeah, that the point about the repository boundary is really interesting. We still keep TerraForm code in a separate repository. But there is some magic change that happens for people when it’s in the same git repository, they already have, like one of the bests. One of the projects we had the most developer engagement on, we had the Helm charts and the application repository, we did it initially, just because of convenience, because the CI/CD we were using was not very good at coordinating multi repository pushes. But then what we saw was, we stopped getting these tickets from people like, can you set an environment variable for me? Because it was just in their repo? And they would they already know how to open pull requests. I think it’s easy to, to underestimate the barrier your create just by making even another directory.

Jake Selby: Yeah, I mean, we have a repo now that we moved, we have the Docker file, we have the Helm charts, we have the values files, we have everything in one place so that you can really define your whole deployment and it has accelerated things. Absolutely.

Victoria Guido: Well, speaking of accelerating things, and making teams go faster, I wanted to bring up a question from from the group in the chat about how do you measure success in the situation I’ve heard on the security side, especially anybody wants to jump in and pick up?

David Greene: A soft measurement that I would toss out there is, does every single person in on all of your teams feel responsible for security? Are they actually making suggestions? If so, I think you’ve got a healthy culture. And you’ve you are achieving success. Because we’re never we’re never can sit on on a particular status. So I would I would offer that at least as one measure.

Lauren Maffeo: I would add to that on the subject of security, we know that breaches from within are one of the major ways that security flaws happen in organizations either through phishing in some one way or another, or other mechanisms. It’s very often the case that hackers get in through an employee. And as a result, we there is now a culture and a lot of organizations of making sure that everyone is responsible for security. So for instance, if you fail a phishing test, which my security team at steampunk sends out regularly, you do have to do a little training course that they’ve made, that hits on two areas that I think are important. You’re sending automated tests and to see who falls for it and then you are providing training, which allows people to see what to spot more sophisticated social phishing. And that’s, I think what we mean by saying you need to educate your employees, that’s a great way to do it. And again, we do, I think it’s safe to say in modern tech organizations expect security to be a shared experience. I do think we need to move towards making data, a shared experience, I still don’t think writ large that most organizations are there yet, especially not in your more bureaucratic organizations and industries like health care. So the shift towards shared security is there even if loosely, and I think data needs to follow?

Jake Selby: Yeah, I agree. I agree with both points. I think, you know, in the security world, especially with healthcare data, it’s a constantly shifting goalposts, right. You know, once you get to this objective, it’s like a little how to get better from there. So I mean, in terms of measurable things, I mean, obviously, the highest objective is not not to get hacked, and not not to leave anything out. But you know, beyond that, I think it’s it’s, are you do you have the processes and procedures in place to assess where you’re at, and, and create actionable plans for making yourself better tomorrow, the day after? And if you have that type of process in place to be able to get give real assessments and actually do something about it. I think that’s, that’s success in my book.

Joe Ferris: I think one important measure of success is how a team responds when something actually goes wrong, like mistakes are inevitable. And so for example, if you have at least a partial security issue, does it immediately go to a full escalation? Or do you have safeguards in place that contain the damage? And then in terms of the team? How does the team react to that? Like, is there finger pointing and blaming? Do people just try to wash their hands of it? Or is there an honest discussion about how it happened? And do you have people trying to design a new system where they feel safer about not making that mistake?

Building resiliency in teams

Victoria Guido: Yeah, that’s a great point. And it leads me to my next question about building resilient teams, right, this is a very highly in demand job, like job title, right as DevOps engineer, especially in Health Tech, and we don’t want to be burning people out. And someone in the in the group chat also asked about just that cognitive load of having to be cross functional and being responsible for data and responsible for security. So what are some strategies you have for building resiliency into your current teams or into your client teams?

Lauren Maffeo: I would say sharing the load. So that’s a big benefit of data stewardship, which is the concept of data ownership in an organization. There are many benefits to data stewardship, but one of them truly is sharing the responsibility of data ownership across an organization. I think data governance is still perceived as a top down set of rules that come from it. And they may or may not know the intricacies of the data you work with. So a lot of times their guidance is incorrect. And the reality is that in today’s world, the CTO, the CEO, the Chief Health Officer, cannot manage all of the data for their hospital or healthcare network on their own, there is too much of it. And it touches too many roles in the organization. So one of the ways you I do believe you can share that cognitive load is to build a cross functional data stewardship team, not just on the dev team, because there are cross functional roles on that team and on your dev SEC ops team in general. But then also looking at who else throughout the organization has enough literacy in Data Management to serve as a steward and really own that data. It takes a lot of work to create that data stewardship plan and get your counsel going. But once you invest that time upfront, I do believe that you can ease the cognitive load on one team and one person because again, there’s too much data that exists and too much data being created each day to have one person manage it anymore. That’s the fastest way to burn out.

Jake Selby: Yeah, and I’ll just add it, I think it was interesting question that that Randall asked in the chat. And so there’s the the resilience of individuals and how we make sure we have shared shared responsibility and structure. And then there’s the resilience of the teams. And I think I think my read of Randles question is how does how does the team handle being responsible for so many different things? Because I mean, obviously, you know, the same even individuals not going to be doing the infrastructure and testing etc. Right. But so so the question is like on the team, how do you how do you say so, you know, we’re all agreed with the premise that a cross functional team that doesn’t have roadblocks waiting for somebody else to do something is ideal state, but then the question is, well, okay, but now they’re asked to do all these things you have to have speciality, understanding how to do these things. So I mean, a, that’s team construction, obviously, you got to make sure you get people who have the different skill sets, when it comes to like, to DevOps in particular. And that skill set, you know, like, as I was mentioning, it’s, you have to make it a full team cross functional responsibility to ensuring that we’re all kind of rowing in the same direction. And now in terms of the more question of like, how does this cross functional team, let’s say, you build up the team, you have somebody who’s got database administration, who’s also a data engineer, you got a full stack? Can people who can build a web app, you got somebody who has DevOps that can help with a CI CD, you got all the different stuff? How do you still make sure that they as a team handle that cognitive load? I mean, I think at that point, it’s really layered responsibility structures, right? So you’ve got the team has its internal responsibilities, but then you’d have external forces, say, the data engineering manager who’s ensuring that the engineers across multiple teams have have an understanding of the right practices, it’s kind of it’s a whole holistic organizational effort. So you know, I speak to this being the person who’s managing the engineers and the DevOps engineers. And so, you know, I’m living that experience. But you know, it really takes kind of a, it takes a village, so to speak, to make sure that the teams have all the resources and an understanding needed to be able to operate all these different functions.

David Greene: Yeah. So Joe, I wanted to pick up a thing that you said earlier about making sure that each that the team has enough flexibility to do what they need to do. I think that is relevant here as well, I think it’s greatly empowering, to be given the given a broad range of responsibility on the team, you’ve already have to have a mental model of the stack that you’re playing in. And by having that cross functional team that’s directly responsible for most of that together, it shortens the time to get answers, and to develop a shared understanding of where you are and where you’re going. So and I just agree with Jake, that that’s not enough. You, you then need to support them with experts. But it’s I think, I think there that tension, Randall in your question, is something that is eased by that knowledge that you’re empowered to find a solution that you are being supported by experts, if it’s beyond the personal experience of anybody who’s on your team.

Joe Ferris: Yeah, I was gonna say, I think it’s not realistic for anybody to know, the whole universe of technologies used at any company. Everything is so broad, even just if you pick infrastructure, there is AWS, you might have Kubernetes. On top of that, you have something managing the clusters, you have infrastructures code, you have CI/CD that used to manage itself. I think it takes a village was a good way to summarize it. One thing that I found is very effective is in addition to having cross-functional people that understand the grand plan is having a culture of interaction between different specialties. So to give one example, from a recent issue we had, this is not a security issue, but we had auto scaling setup to have dedicated containers for a Graph QL API, because it was particularly intensive. And what we found is the requests were not going to those containers, and we had to infrastructure platform engineers look at the problem for I think two days, without being able to figure out how it was not getting routed to the engineers everything or to the containers, everything looked correct. And finally, they set up a pairing session with one of the mobile developers. And in about 20 minutes, they found out that the URL they were sending was //Graph QL/Graph QL. And the application took it happily but the routing layer saw that was different. And having Infrastructure Engineers paired with developers, having developers paired with mobile engineers, having the team interact a lot, I think is really important beyond just having the different disciplines and having the code be shared actually have the people interact.

Site reliability

Victoria Guido: That makes sense and, you know, kind of touching on incidents and having issues we can move maybe into about talking about Site Reliability, right? And that’s how critical that could be to different healthcare applications. So maybe do any of you have a story about, you know, having to do a site reliability and how you kind of improve the monitoring or engagement of a site? Everybody looks so thoughtful. I don’t want to interrupt if you’re like thinking about it in your head.

Jake Selby: I mean, obviously, monitoring and observability is key for any business that’s got publicly facing assets that clients would have access to, it doesn’t look good when your stuff goes down. I think a big key thing is ensuring that you have both continuous monitoring in place, but also that you leverage that into your workflows. To me the big inflection points that cause outages are you change the code, or you change the data. And so ensuring that you have just constant validation, or checks in place that are doing periodic monitoring where you’re naturally validating traffic and checking all that, but then building in place mature CI/CD workflows to ensure that when changes are made, you have the right testing in place. So what does that look like? I just pushed a new one pushing a new version of API, it really depends on there’s a lot of things that go into deciding what the ideal workflow looks like, for us. We’re a b2b business. And so our traffic isn’t huge in terms of just native traffic. So it doesn’t make sense to do something as advanced is like AV testing, or anything like that. But how do we still do to validate things and have a seamless deployment experience? So for those lower traffic API’s, it makes sense to say, okay, instead, why don’t we do a bluegreen deployment model, deploy to this site, AdSense, simulate a bunch of traffic with a bunch of the test cases that we’ve generated, hit it a bunch of times, make sure that it’s working, but plug that through your monitoring system that you have in place, so that way, you’re, ensuring that the same types of checks that you design for also the ones that are being checked natively in the wild, right? So I mean, you know, I guess it’s a little bit of anecdote, but the idea is that you really both leverage your monitoring systems, both for software deployment as well as day-to-day but also put a lot of thought into how you deploy and how you make changes and validate them and make sure that the solution matches your best use case.

Joe Ferris: Yeah, I think the most success we’ve had with SRE is when the the measurements that engineers use for reliability are the same that people use for customer success. And so like giving a concrete example, we had a client that had to keep vaccination records, and they must be submitted to a central database. And so we would track the lag time between when a vaccine was administered and how long it took to be sent and received by the central database. And so by monitoring things that people actually care about, like the end users care about, you both improve the time to respond when things are really adversely affected. But also your reduced pager fatigue, like we don’t page people for things like high database, CPU utilization, because you just get used to it going off and you start to ignore it. Whereas like vaccines not being submitted to a database, somebody is actually going to pay attention to if that’s the kind of message they get.

Jake Selby: Yeah, I think it’s really key to to define those. That’s what I call synthetic checks, right? But what is, you know, essentially, like, something that’s bigger than just my my service is returning a 500. Right? So defining what those are and the closer to Joe’s point, the closer you align those to actual business use cases, the better it’s going to be. You know, because obviously, that’s the kind of stuff that your clients are gonna see for bricks anyway.

David Greene: I think that’s a really fortunate case drill that you’re talking about where the SMEs are touching something that’s directly applicable to the clients. One strategy I’ve seen that can be useful if you don’t arrive Then mix something up, measure, how useful the SRE is, are being to the devs. You know, if you’ve got a separate group, and you’re making it easier to deploy it or making it easier to test, you’re making it easier to get all the information you need as as a dev and, you know, service reliability is that’s not all of what SRE is. But I think there’s an overlap, and you can draw some of your measures from there, at the very least, because, you know, developer, precision developer speed is certainly something that then translates into something that customer cares about.

Victoria Guido: Right. And, David, I understand you work on projects, where providers are actually providing care while they’re using the application and so that I’m sure you have to define certain metrics and and things about reliability and security in that case, is that right?

David Greene: Yeah, I mean, so to, to Jake had mentioned, you know, that, that his company is getting information from, you know, Medicare, and then providing it, you know, business to business to then maybe, maybe an ACO maybe, you know, and then so this, that directness is filtered through somebody like Jake in his company. But, there’s, you’re absolutely right, in that the bar is raised a lot higher when we know that the information is coming in. But what if, what if the information was on the wrong patient? What if, what if, you know, we missed reporting on a key diagnosis or a drug that they were taking, and it messed things up? I mean, there are people that could get hurt. So you know, knowing those things, makes you put as many checks and balances and as you possibly can.

Jake Selby; Yeah, I would say David’s use case I can only imagine I mean, being the producer, that’s then taken by other entities and reiterated and then sent back out. I mean, they’re kind of the the original producer, and then that one, that’s a big responsibility. So I can only imagine the the strictures you put in place. Well, I mean, follow that chain back, right. We’re in healthcare, the data, we’re talking about our claims, those claims came from healthcare providers that are entered with some wide range of data. You know, how good is that data as part of what Lauren’s been talking about. You can’t always depend on that. And the more people touch it, the more systems that touch it, the more room there is to get that out of whack, even even in to the best. With the best motives of cleaning it up.

Health data standards

Lauren Maffeo: I read an article once talking about this, about the about the many challenges with the US healthcare system, and one of them is the lack of a centralized database. If you think about, you know, the National Health Service in England, for instance, in the UK, and they have a centralized repository, by virtue of being a singular health service. Of course, there’s also private healthcare in the UK. But by and large, there is a centralized resource that people can use for patient data. And because healthcare is so fragmented in the US, that creates enormous challenges when you’re talking about data and data governance, because forming a data governance council across US healthcare is, I hate to say it’s impossible, I don’t see how it’s possible, given the nature of how disparate the data is. And so that is an enormous challenge that we need to solve. But at bare minimum, we need to be aware of I hadn’t ever heard, it framed that way when it came to healthcare data, and how that lack of a centralized repository is such a challenge. But it really results in to put it kindly, a negative user experience and negative patient experience and the consequences can be very, very negative.

Victoria Guido: Absolutely, yeah. Hopefully, bringing up all kinds of topics and ideas in the healthcare space because when we talk about you know, health tech, that’s a huge range of what you know everything from devices to web applications to back end data read health records. Oh, and let’s see, we got a question from the audience. So is the FHIR standard and attempt to centralize or at least make data more easily exchange for your panelists are familiar.

David Greene: I can feel that I’m relatively familiar with FHIR. Yes, that is the idea behind it is to create a singular state protocol for delivering the healthcare data. So, for those who aren’t familiar, actually, I need to remind myself Firesticks was the fast healthcare interoperability resource, right, so essentially, it’s a standard, defining how you how you define a JSON object that that details a claim, really, that’s another day. Today, we have all these different electronic health record organisations companies that have different formats. And so that’s a big problem we have today is one company uses this one field, the other one doesn’t, and they kind of get thrown into a catch all by organizations like CMS. It’s a bit of a big wild west out there, right. And so I mean, FHIR is very interesting, we’ve played around with some use cases for it, there’s a couple of reasons that you’d want to get into it. And healthcare one is if you want to start consuming more current data, you know, if you’re looking at something like what CMS provides, like CCLI, for claim and claim line feed that those are monthly files, a lot of the cases monthly and FHIR, there’s a lot of different ways to, whether it’s like through Blue Button or other programs, to get much more current data through FHIR. So that is such a long story short, that is the the attempt, I think that there hasn’t been as much adoption, as they I think, wanted a few years ago. Part of it is, it’s a very verbose language. And so when you start thinking about transmitting lots and lots of data, that verbosity can run to create issues for use case, that just you kind of have to, it’s going from going from comma delimited datasets to a bunch of JSON, right, so you know, you can just imagine, the kind of refactors that systems would have to do in order to process and do something with that data. And so I think that’s been the the big, slow, the big elephant there that kept things slow so far. One, I want to add one thing to that, as somebody who’s worked with the government on helping create, a little substandard in FHIR, there are so many different use cases for debate. And FHIR does a great job of being able to let you specify, okay, we’re going to have this data set, we can have multiple copies of it, these are the possible values, we can inherit from these other things, there are a lot of groups doing a lot of work saying, okay, for in our section of the industry, this is, these are the problems that we need to solve, it’s very active. But at the end of the day, any given pair of organizations sharing the data are going to have to agree on a lot of things, because there are a lot of decisions left by the standard to the individual implementation.

Joe Ferris: I think another challenge is the the quality of data as a consumer, even with a standard, there are many dialects. And what I found is, like I’ve been on both sides producing data and consuming data. Mostly people are producing or trying to solve a very specific use case. And they’re not particularly interested in all the different things the standard has to offer. And so for example, they might develop a system that admits records about patients being admitted, but then doesn’t do anything else. And so if you have that as part of your ingestion pipeline, you have all these weird things where it’s like, well, this person appeared in a hospital, and then appeared in a different hospital two years later, but there’s no record of what happened in between. And so even though FHIR does standardize some of the way we put things into the JSON, there are still many different ways to put strings in those those JSON files that can have interesting consequences. Yeah, I think that part of the reason it got designed that way is just because there were so many use cases being supported that we had to create kind of a very, I think, David probably shed more light on this than I can but you know, a more a very flexible standard with the problem is once you create that swiss army knife, then you got to support all those different pieces, right.

Victoria Guido: That makes a lot of sense. And I want to give our audience a chance to ask some questions. So if you want to put your question in the chat here, we’ll do that before we are about we have a little bit less than 20 minutes left. So any questions or your panelists have a question for someone else on the panel you’re also invited in to ask questions.

Emergent, PII and test data

Joe Ferris: I have a question. In terms of data governance and ownership. One problem that I’ve seen a couple times now is in creating metrics for reliability, about the application, we accidentally or incidentally create more data that that needs to be governed. And is like, we don’t do it intentionally. So for example, we might measure the time it takes to do a certain thing. But then as a byproduct, you’re actually recording, for example, how long an appointment takes. And that becomes meaningful data that you have to care about. So I think, I guess my question is, how do you manage that emergent data? How do you manage ownership of that?

Lauren Maffeo: I would say it’s complicated. But I do think the first step is to recognize that you cannot separate data quality from any of these conversations about ownership, metadata, things of that nature. And that might sound really obvious to folks on this call. But I think it’s worth reinforcing because I literally today had a conversation with someone at a in a stakeholder role, who was trying to say to our data architect, when the Data Architect showed him the quality standards and how we wanted to automate that throughout their pipeline, this stakeholder said, I don’t understand why you’re accounting for data quality in this diagram, I want to see how you’re going to manage metadata. And I want to see how you’re going to manage data, data quality in a separate chart. And the entire concept of those being two separate endeavors, is a huge problem. So I think going back to your question about what do you do when you create new data, it starts, I think, by recognizing that data quality is not a separate output or goal to any of this, it has to be at the forefront. And you have to account for data quality in all of your pipelines and diagrams, you have to design for it at the outset. The other thing is to ask yourself, which data is really important and set up mechanisms for data destruction? There, I think, especially in health care, hanging on to any and all data is not just a security risk and a liability. It’s really expensive. If your housing all of that data in the cloud, or in data centers, you are spending a lot of money for data that you probably don’t use 90% of. So if you inadvertently create more data, it’s not necessarily that you have to govern it at all, I think you have to ask yourself, Is this useful? If so, which business use cases does it serve? Who is the who is already owning the data in that domain, and who would be the best person to take it? If it’s noise, it probably is better to try to destroy it. And I will say getting into that habit of having data describe destruction is also important, because volume of data is so high.

Jake Selby: Yeah, I definitely agree that that creating automation of like log rotation lifecycle policies on things where it’s it’s data that you don’t need to keep, you have no requirement to keep it for a certain period of time, it’s always great to clean it up. To Joe’s point about about what happens when you’re creating data, maybe the more general things, what happens when you’re creating data, whether it’s metadata, or data that’s being generated, but he’s whatever the use case, what happened, what about when you’re creating data that creates more governance? I mean, I think I think that’s just the way of the world specially actually, with metadata, right? Especially, you know, with the kind of the designing metadata driven applications, where so much of the functionality is driven by what’s actually in the database, having those those those types of, and realizing that’s, that’s a very powerful thing. Now, it’s like now now, it’s not just getting my, my code, but also making sure that I’m putting in gates and structure on how I version, the actual data itself, and how it it gets protected, ensure that, you know, only the right processes can update it. And when it comes to logging, you know, sometimes the logs might have sensitive things in it or or to just point you might be out, calculating certain things that that actually fall under certain standards for what becomes personally identifiable health information. Right. Yeah, it’s, as part of one of the use cases we’ve been working on is looking at ways to de identify data that we’re producing. And so I’ve learned so much about all the intricacies of what can be considered, you know, something that you could use to reverse calculate identity of someone right and so, so you know, you do have to be be careful about the stuff that your applications are producing that you might not inadvertently create PII. And all the obligations that come alongside of it.

Lauren Maffeo: That was actually my question, I was going to ask if anybody on this call had experience with PII (personally identifiable information) in healthcare, and how you manage that throughout the DevSecOps pipeline and lifecycle, either masking it or in some cases, unmasking it, the nuances of that, especially when it comes to healthcare, I would be curious to hear how folks on this call on the panel or in the audience have handled that before in healthcare.

David Greene; One use case that goes that you may not be talking about, but you remind me of the importance of having test data, and not using de identified data as test data. It’s really hard to get to get realistic test data, does it actually have all the different health care outcomes that you might expect? Is it consistent for a given individual across their time? What do your users care about? If they care that deeply about it? It can be expensive and hard to do. But it’s worth doing? Because, you know, like Jake, was pointing out that you can’t reverse engineer it, because it’s absolutely been created from from scratch.

Jake Selby: Yeah, wholeheartedly. I mean, it’s creating a representative data set that is stratified across all the different, you know, different slices of the data. It’s very, very challenging if you’re not doing the identification route. Yeah, Lauren, my company’s been butters taking psi and processing it and producing it back out either in psi form or not, that’s what we do. So I mean, yeah we have a lot of different processes, we have to put in place to protect that data. We have network access policies, as well as database level policies, obviously, you know, it’s layers of protections. We have a lot of lot of data governance, in terms of how we promote datasets to either to production, but also to be sent to customers, and a lot of automation of validations, to make sure that the correct data is being propagated for the right client, et cetera. And then we’ve we’ve had to do things like take our different types of data. And I think Lauren, you actually mentioned earlier, talking about filling out your data use cases. And you know, we’ve had to do that. And really segment, let’s put the PII PII data in in place in one, you know, one instance of data that has all the highest security, and we have special networking policies for access to that special everything. And then the data that’s not as secure, we can put in a different place with maybe more less restrictive policies for access. And so then you have your database level permissions, but then you also have just networking based like I can’t even get to the database, right. And so having those different layers of protection, having the different layers of governance for how you provision, access, putting as much automation in place, you know, one of the early things we did once we really started is we built kind of a custom skim implementation. So a custom application that that will provision access to all of our different services and accounts. And there we have a centrally managed place to provision access to all of our systems, right. So that’s the kind of thing you have to do when you start getting to the housing and distributing PII game.

Joe Ferris: Yeah, I think in terms of DevOps, one, big one is to make sure that all of the data changes and queries go through the pipeline. So like developers get used to having production access and doing quick checks directly on the data. And people build a skill set of debugging using production consoles. And so you have to build a separate process for doing that, where people learn how to submit pull requests that run jobs that will record something about the data. And that way, somebody gets to review it before it runs, and you make sure like, oh, actually, that’s going to create new PII that we have to care about, or actually, nobody should be seeing that. So having everything go through DevOps and through the CI/CD pipeline, and I think the other thing that’s really helpful is, if you assume that at some point people aren’t going to be accessing the data is making sure that you have audit logs of everything. Because if you can’t prevent people from accessing the data, you can at least see what happened and recreate the events after the effect.

Victoria Guido: Oh, that’s great. And thank you all for your input. And for being on the panel. We’re kind of coming to the end of our time here. So I’d like to go back around and get final takeaways from everyone. And if you could start off with your your name and title since I didn’t hit the record in times of will have that video, that would be great. Let’s see, who am I going to pick who is closest to the center of DC? I think it’s probably Lauren.

Final takeaways

Lauren Maffeo: I’m a service designer at Steampunk. And I’m also the author of a book called Designing data governance from the ground up. And I would say in the context of healthcare data, it’s really essential to start by knowing where your data lives, that’s another big challenge for large bureaucratic organizations. I know if we’ve worked with clients in these organizations, it’s it can feel impossible to get answers on. Does stuff originate in a CSV on somewhat random person’s computer? Is it in an on premise server? Is it in AWS, we don’t know. And, and taking the time to really document not only what data you have, but where it lives is super essential, especially as it relates to health care.

Jake Selby: I’m Jake Selby, the Vice President of Engineering at Kerr journey. Yeah, I guess take aways look, I mean, the stuff is hard, it’s even harder when you have the stakes of health care. So you know, I think the big things that that I think I’ve heard are everybody kind of preach preaching is, is is enabling developers making sure that people that we shut things off making sure we give more knowledge to our teams. But we also at the same time that we give more responsibilities and more power to all the engineers, we also have to produce put in place a lot of very strong governance programs, and a lot of that kind of top down different procedures and processes, and put those in place to ensure that while you’re giving them more flexibility, and allowing people to move faster, you’re also validating that they’re doing things the right way. And that that you’re you’re maintaining security and and you do that with the the process and procedure, but you also do that with the automation of validations and checks and what have you. So automate, empower, trust, but verify. Thank you, Jake. David, do you want to go next?

David Greene: Sure. My name is David Green, I am a principal engineer contracting for Medicaid at the moment on a data warehouse. And this just to conclude, I would say that you know, DevOps is the thing while we our topic is about compliance and security, we often think about compliance and security as things that stand in the way of getting things done. But DevOps, in the automation that brings and the checks and balances, it gives, I think, can really overcome that. Things faster. And so I think if you’ve worked together, you design your designing, you look at compliance, not so much creatively, as holistically learned, even talking about service design, you know, it’s important to think about these, everything that you’re doing within a larger framework, but I think there’s so much potential in DevOps, to make security and compliance, something that you are, you know, you’re relieved about and is, is, is actually helping you go faster. Wonderful,

Joe Ferris: Joe Ferris, I’m the CTO and a developer at thought bot. I’m in the Boston area. So I’m within train distance of DC. I think my big takeaway is that creating a culture of enablement and group ownership of practices, but also mistakes and of processes is really essential to having any kind of security culture that people can’t talk about. Or if people are afraid to try and change something, then it’s not going to go well.

Victoria Guido: That’s a great point to end on. And a topic we always bring up at DevOps DC as well is that it’s not just automation or a set of fancy tools. It’s also the culture and that’s Are you build really resilient teams. So thank you all so much for coming here and spending time with us and sharing your thoughts today, I’ll hand it back to Christina to, to wrap us up.