01t0083o_aamas07_0565_ecd6c80d79be396baaf4a0d81eeb89d8.dvi

Distributed Intrusion Detection in Partially Observable
Markov Decision Processes
Mathematical & Computer Sciences Department ABSTRACT
malfunctioning of the agents due to internal failure or even The problem of decentralized control occurs frequently in due to enemy intrusion cannot be discounted. There is a realistic domains where agents have to cooperate to achieve growing need to develop multiagent systems that can ad- a universal goal. Planning for domain-level joint strategy dress such situations and can efficiently compensate for such takes into account the uncertainty of the underlying en- setbacks. We analyze the problem from the perspective of vironment in computing near-optimal joint-strategies that the malicious agent and come up with one possible attack can handle the intrinsic domain uncertainty. However, un- that it can launch on the system. We then use decision the- certainty related to agents deviating from the recommended oretic paradigms to obtain a solution to such attacks and joint-policy is not taken into consideration. We focus on propose reputation update schemes that distributively iden- hostile domains, where the goal is to quickly identify devi- tify such malicious agents. We substantiate our claims with ations from planned behavior by any compromised agents.
results from the popular multiagent tiger domain [2].
There is a growing need to develop techniques that enablethe system to recognize and recover from such deviations.
DEC-POMDP MODEL
We discuss the problem from the intruder’s perspective and A DEC-POMDP is given by a tuple < I, S, {Ai}, {Ωi}, O, then present a distributed intrusion detection scheme that P, R, b0 > where I is the finite set of agents indexed by can detect a particular type of attack.
1 . . . n, S is a finite set of states, Ai is a finite set of actionsavailable to agent i and A = ×i∈IAi is the set of joint actions Categories and Subject Descriptors
where a =< a1, . . . , an > denotes a joint action, Ωi is a finite I.2.11 [Artificial Intelligence]: Distributed Artificial In-
set of observations available to agent i and Ω = ×i∈IΩi is the set of joint observations where o =< o1, . . . , on > denotesa joint observation, O is the observation function given by O(s, a1, . . . , an, o1, . . . , on), the probability of observing the General Terms
joint observation (o1, . . . , on) when transitioning to state s after taking joint action (a1, . . . , an), P is the set of Marko-vian state transition probabilities where P (s, a, s ) denotes Keywords
the probability of taking action a in state s and reachingstate s , R : S × A → b0 is the initial belief state for all agents. We assume thatthe agent’s observations are independent. Thus the obser- INTRODUCTION
vation function can be represented as O = ×i∈IOi where The problem of decentralized control can be effectively Oi(s, a1, . . . , an, oi) is the probability that agent i observes modeled as a decentralized partially observable Markov De- oi given the joint-action < a1, . . . , an > resulted in state s.
cision Process (DEC-POMDP) [1] where the agents in the The decision problem spans over a finite horizon T .
model are the decision makers. The uncertainty related toeach agent’s action does not only span over the uncertainty POLICY REPRESENTATION AND RUN-
of the underlying environment but also on the decision mak- NING BELIEF STATE
ing of the other agents in the system. In hostile domains, The policy for agent i, πi is represented by a policy tree (see Figure 1). Each node corresponds to an action and eachedge corresponds to an observation that the agent makes atthat time interval. We assume that there is a centralized Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies are planner that computes the policy tree for each agent. We not made or distributed for profit or commercial advantage and that copies introduce the concept of running belief state of an agent bear this notice and the full citation on the first page. To copy otherwise, to as an execution time estimate of an agent over the physical republish, to post on servers or to redistribute to lists, requires prior specific states and the observation histories of the other agents in the system. The running belief state of agent i at time interval AAMAS’07 May 14–18 2007, Honolulu, Hawai’i, USA.
Copyright 2007 IFAAMAS .
t is given by RBti : S × ot−i → [0, 1] where ot−i are the t’th observation histories of other agents. We define Belt set of all such possible combinations of physical states and observation histories that have a positive probability in RBt Thus Belti can be defined as Belti = {b|RBti(b) > 0}.
Running Belief State Update
The agents update RBti and Belti with each execution step. The update occurs in two phases. The first exhaustive Belti. We call this estimate the priori estimate, RBpriori Figure 1: Policy tree.
P (s, < πi(oti), π−i(ot−i) >, s ) × Oi(s , < πi(oti), π−i(ot−i) >, k has no incentive to choose an action from another node at ) × O−i(s , < πi(oti), π−i(ot−i) >, ot+1 that the joint reward received would be a member of ηti, ∀i, where α1 is a normalizing constant and s, s ∈ S. Belpriori = k. We refer to such agents as simple malicious agents. A is calculated accordingly. The second phase of the update simple malicious agent can be deterministic if it always takes happens after the agent takes its t+1 ’th step action and the action corresponding to the least probable observation obtains joint reward given by the random variable Rt+1. The of the subtree of the policy tree it is in or stochastic if it agent can now calculate the posteriori estimate RBposteriori takes any action corresponding to the observations of thesubtree with equal probability.
Reputation management
To identify deviation from committed plans from com- promised agents, each agent maintains updated reputation where α2 is another normalizing constant and r is the re- about all other agents in the system. This reputation man- ward received at the t+1 ’th time step. The second probabil- agement scheme runs in all agents in completely distributed ity term in Equation 2 returns a value of 1 or 0 depending fashion. Each agent i maintains a set Vi = {Rji} where Rji on whether it is possible to get the reward r after taking is the reputation of the jth agent as computed by agent i, ∀j = i, and is updated in each iteration by: ingly. Agent i knows its own physical state and using RBt i , it tries to approximately estimate the state of all j, j ∈ I, j = i. It can be shown that Belti is always an over-estimate of the actual world state.
Anticipated set of rewards
is a set of all possibles rewards that it can get by taking the is a subset of beliefs most convincing to i. Based on Beltmax i tries to reason the last observational transition that each i at time t. The anticipated set of rewards of the other agents have made. The numerator of the sec- i at time t is ηti = {R(s, < πi(oti), π−i(ot−i) >)|(s, ot−i) ∈ ond expression in Equation 3 gives the difference between }. Note that the agent has to use the Belpriori the probability of observing the most probable observation estimate as the anticipated rewards are calculated before given t-1 ’th joint-action with the probability of observing the t’th action has been taken. If an agent i receives a the t’th observation given the same. Note, a simple mali- ∈ ηti at time t, it concludes that some agent(s) cious agent k would often fake observations and this incon- have deviated from their committed policies.
sistency would gradually reflect in the Belti of i and result We assume that there can be compromised agents in the in a higher value for the numerator. This would result in a system which want to have a covert harmful impact on the sharp drop of Rki. The function κ is monotonically decreas- system. A malicious agent k can only avoid suspicion if it takes actions which are consistent with the anticipated i and thus facilitates faster detection.
reward estimates of other agents. The moment k takes anaction that produces a joint reward r / that some agent(s) must have deviated from the committed For our experiments we use a three agent version of the joint policy. k can choose from the set of actions available Tiger problem [2] and refer to the agents as Agent1, Agent2 with the different observations for the particular node of and Agent3. The observation function for each agent is gen- policy tree it is in and thereby avoid detection. We refer to erated randomly over each pass of the run. We assume do- such agents as simple malicious agents. In Figure 1, if k is mains, where probability of one observation given a state at branch b1 of πk at time t , it chooses between the actions and joint-action is significantly higher than the other obser- a1 and a2 pertaining to observations o1 and o2 respectively.
vations. All results presented are averaged over ten runs.
The Sixth Intl. Joint Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS 07) Figure 2: Agent3 deterministically takes actions cor-
Figure 3: Both agent2 and agent3 deterministically
responding to wrong observation.
takes actions corresponding to wrong observation.
The initial policies for each agent is generated randomly for each iteration. We do without calculating the optimal joint policy as the reputation mechanism works well for any com- mitted joint policy. The results presented are based on thecalculations made by Agent1.
In the first simulation, Agent3 is a deterministic simple malicious agent whereas Agent2 takes actions based on its true observations. Figure 2 shows the individual reputation plots of the two agents in the system. The drop in reputa-tion for Agent2 is due to the domain effect but the much steeper drop for Agent3 is far from the normal drop due to the domain level uncertainties. In the next simulation, wechanged Agent3 to a deterministic simple malicious agent.
Figure 4: Agent3 stochastically takes actions corre-
Figure 3 shows the corresponding reputation plots and we sponding to wrong observation.
see that the reputation of both the agents show a sharp fall.
In the next set of experiments, in an attempt to avoid de- tection, the simple malicious agents stochastically chooses the action pertaining to the wrong observation. Again in thefirst case we assume that Agent3 is the only malicious agent in the system. Figure 4 is the reputation plot for the cor-responding case. The steeper drop in the reputation value for Agent3 clearly identifies that Agent3 have been taking more actions pertaining to wrong observations. So Agent1 can conclude that either the agent has faulty observational sensors or it has been corrupted by external influence. Inthe next simulation, both Agent2 and Agent3 behave as ma- licious agents that stochastically take actions corresponding to the wrong observation. Figure 5 shows the correspond-ing reputation plot of the two agents. The sharp dip in the Figure 5: Both Agent2 and Agent3 stochastically
reputation of the two agents signify an abnormality in the take actions corresponding to wrong observation.
REFERENCES
CONCLUSION AND FUTURE WORK
[1] D. S. Bernstein, R. Givan, N. Immerman, and In this paper we dealt with the problem of detecting agents S. Zilberstein. The complexity of decentralized control that deviate from committed joint-policy. Assuming that of markov decision processes. Math. Oper. Res., the agents can be compromised, we discussed one possible type of deviations from prior commitments. We presented [2] R. Nair, M. Tambe, M. Yokoo, D. V. Pynadath, and a distributed reputation update mechanism of uniquely de- S. Marsella. Taming decentralized pomdps: Towards tecting such malicious agents. Our scheme is based on track- efficient policy computation for multiagent settings. In ing of the execution time estimates of the belief state of the Proceedings of the Eighteenth International Joint system and the anticipated rewards. As future work we conference on Artificial Intelligence (IJCAI), 2003.
would also like to explore more challenging scenarios deal-ing with malicious agents such as colluding agents that tryto compromise the system.
Acknowledgment: US National Science Foundation award
IIS- 0209208 partially supported this work.
The Sixth Intl. Joint Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS 07)

Source: http://www.aamas-conference.org/Proceedings/aamas07/html/pdf/AAMAS07_0565_ecd6c80d79be396baaf4a0d81eeb89d8.pdf

Date: may 20, 2005

GLOUCESTER COUNTY PUBLIC SCHOOLS Limited Flexible Spending Account (“FSA”) Who should enroll in the Limited FSA Plan? If you are participating in the Health Savings Account, you must enroll in the Limited FSA Plan if you wish to participate under an FSA Plan for the renewing Plan Year that runs from October 1, 2013 through September 30, 2014. Although you are not required

Untitled

On a recent bus trip to Toronto, following the route of the 1837 rebels from Lloydtown, we were asked to re-create the experience in our mind'seye. To imagine the effort of a 35 mile hike, to pass farmers’ fields andsmall inns, to feel the privation of thirst or hunger and the anxiety of potential conflict that they must have experienced on this long walk. A tough task,considering the amo

Copyright © 2010-2014 Internet pdf articles