HVAC systems must provide comfort thermal condition despite the complexity. In these systems the controllers must be able to adapt the changing environmental parameters. Nowadays in the most air conditioning systems use ON/OFF controllers (thermostats) or Proportional - Integral - Derivative (PID) controllers in the advanced process, those are unable to provide the desired environmental conditions. Furthermore, in the long run, these controllers are expensive, because they operate at a very low-energy efficiency and tuning of HVAC systems is difficult and time consuming. In this study we consider the single zone HVAC system and assume the system is operating on the cooling mode and to achieve the desired optimal operating point, we benefit the high order fuzzy method that includes mamdany fuzzy controller for assessing variation of control variables and sugeno fuzzy controller for linearization gain coefficient. We use discrete action reinforcement learning automata to final set the gain. The simulation results shows that this control strategy is very robust and flexible than other methods and able to reduce energy loses.