Quantcast
Channel: Ask the Directory Services Team
Viewing all 274 articles
Browse latest View live

Is this horse dead yet: NTLM Bottlenecks and the RPC runtime

$
0
0

Hello again, this is guest author Herbert from Germany.

It’s harder to let go of old components and protocols than dropping old habits. But, I’m falling back to an old habit myself…there goes the New Year resolution.

Quite recently we were faced with a new aspect of an old story. We hoped this problem would cease to exist as customers move forward with Kerberos-based solutions and other methods that facilitate Kerberos, such as smartcard PKINIT.

Yes, there are still some areas where we have to use NTLM for the sake of compatibility or absence of a domain controller. One of the most popular scenarios is disconnected clients using RPC over HTTP to connect to an Exchange mailbox. Another one is web proxy servers - which still often use NTLM although they and most browsers support Kerberos also.

With RPC over HTTP you have two discrete NTLM authentications: the outer HTTP session is authenticated on the frontend server and the inner RPC authentication is done on the mailbox server. The NTLM load from proxy servers can be even worse - as each TCP session has to be authenticated - and some browsers frequently recycle their sessions.

One way or the other, you end up with a high rate of NTLM authentication requests. And you may have already found the “MaxConcurrentAPI“ parameter, which is the number of concurrent NTLM authentications processed by the server. Historically there has been constant talk about a default of 2. However, the defaults are quite different:

  • Member-Workstation: 1
  • Member-Server: 2
  • Domain Controller: 1

The limit applies per Secure Channel. Members can only have one secure channel to a DC in the domain of which they are a member. Domain Controllers have one Secure Channel per trusted domain. However, as many customers follow a functional domain model of “user domains” and “resource domains”, the list of domains actually used for authentication is low and thus DCs are limited to 1 concurrent authentication for a certain “user domain”. Check out this diagram:

image

In this diagram, you see authentication requests started against servers in the left-hand forest as colored boxes by users in the right-hand forest. We are using the default values of MaxConcurrentAPI. The requests are forwarded along the trust paths to the right-hand forest. The trust paths used are shown by the arrows.

Now you see that on each resource forest DC up to 2 requests from member resource servers are queued. On the downstream DC, you get a maximum of 1 request from the grand-child domain. The same applies to the forest root DC. In this case, the only active authentication call for forest 1 is for the forest 2 grand-child domain, shown with brown API slots and arrows. Now that’s a real convoy…

The hottest link is between the forest root domains as every NTLM request needs to travel through the secure channels of forest1 root DCs with forest2 root DCs.

From the articles you may know “MaxConcurrentAPI” can be increased to 10 with a registry change. Well, Windows Server 2008 and Windows Server 2008 R2 have an update which pushes the limit to 150:

975363 A time-out error occurs when many NTLM authentication requests are sent from a computer that is running Windows Server 2008 R2, Windows 7, Windows Server 2008, or Windows Vista in a high latency network

This should be of some help… In addition, Windows Server 2008 and later include a performance object called ”Netlogon” which allows you monitoring the throughput, load and duration of NTLM authentication requests. You can add that to Windows Server 2003 using an update:

928576 New performance counters for Windows Server 2003 let you monitor the performance of Netlogon authentication

The article also offers a description of the counters. When you track the performance object you notice each secure channel is visible as a separate instance. This allows you to track activity per domain, what DCs are used and whether there are frequent fail-overs.

Beyond the article, these are our recommendations regarding performance baselines and alerts:

Performance counter 

Recommendation

Semaphore Waiters 

All Semaphores are busy, we have threads and thus logons waiting in the queue. This counter is a candidate for a warning.

Semaphore Holders 

This is the number of currently active callers. This is a candidate for a baseline to monitor. If this is approaching your maximum setting in baselines, you need to act.

Semaphore Acquires 

This counts the total # of requests over this secure channel. When the secure channel fails and is reestablished, the count restarts from 0. Check the _Total instance for a counter for the whole server. Good to monitor the trend in baselines.

Semaphore Timeouts 

An authentication thread has hit the time-out for the waiting and the logon was denied. So the logon was slow, and then it failed. This is a very bad user experience and the secure channel is overloaded, hung or broken. Also check the _Total instance.

This is ALERT material.

Average Semaphore Hold Time 

This should provide the average response time quite nicely. This is also a candidate for baseline monitoring for trends.

When it comes to discussing secure channels and maximum concurrency and queue depth, you also have to talk about how the requests are routed. Within a forest, you notice that the requests are sent directly to the target user domain.

When Netlogon finds that the user account is from another forest, it however has to follow the trust path, similar to what a Kerberos client would do (just the opposite direction). So the requests are forwarded to the parent domain and eventually arrive at the forest root DCs and from there across the forest boundary. You can easily imagine the Netlogon Service queues and context items look like rush hour at the Frankfurt airport.

So who cares?

You might say that besides the domains becoming bigger nowadays, there’s not a lot of news for folks running Exchange or big proxy server farms. Well, recently we became aware of a new source of NTLM authentication requests that was in the system for quite some time, but that now has reared its head. Recently customers have decided to turn this on, perhaps due to recommendations in a few of our best practices guides. We’re currently working on having these updated.

RPC Interface Restriction was introduced in Windows XP Service Pack 2 and Windows Server 2003 Service Pack 1 and offers the options to force authentication for all RPC Endpoint Mapper requests. The goal was to prevent anonymous attacks on the service. The goal may also have been avoiding denial of service attacks, but that one did not pan out very well. The details are described here:

http://technet.microsoft.com/en-us/library/cc781010(WS.10).aspx

In this description, the facility is hard-coded to use NTLM authentication. Starting with Windows 7, the feature can also use Kerberos for authentication. So this is yet another reason to update.

The server will only require authentication (reject anonymous clients) if “RestrictRemoteClients” is set to 1 or higher. When you have the combinations of applications with dynamic endpoints, many clients and frequent reconnects in the deployments, you get a sustainable number of authentications.

Some of the customers affected were quite surprised about the NTLM authentication volume, as they had everything configured to use Kerberos on their proxy servers and Exchange running without RPC over HTTP clients.

Exchange with MAPI clients is an application architecture that uses many different RPC interfaces, all using Endpoint Mapper. The list includes Store, NSPI, Referrer plus a few operating system interfaces like LSA RPC, each one of them triggering NTLM authentications. The bottleneck is then caused by the queuing of requests, done in each hop along the trust path.

Similar problems may happen with custom applications using RPC or DCOM to communicate. It all comes down to the rate of NTLM authentications induced on the AD infrastructure.

In our testing we found that not all RPC interfaces are happy with secure endpoint mapper, see the blog of Ned.

What are customers doing about it?

Most customers are then going to increase “MaxConcurrentAPI” which provides relief. Many customers also add monitoring of Netlogon performance counters to their baseline. We also have customers who start to use secure channel monitoring, and when they see that a DC is heaping incoming secure channels, they use “nltest /SC_RESET” to balance resource domain controllers or member servers evenly across the downstream domain controllers.

And yes, one way out of this is also setting the RPC registry entries or group policy to the defaults, so clients don’t attempt NTLM authentication. Since this setting was often required by the security department, it is probably not being changed in all cases. Some arguments that the secure Endpoint Mapper may not provide significant value are as follows:

1. The call is only done to get the server TCP port. The communication to the server typically is authenticated separately.

2. If the firewall does not permit incoming RPC endpoint mapper request from the Internet, the callers are all from the internal network. Thus no information is disclosed to outside entities if the network is secure.

3. There are no known vulnerabilities in the endpoint mapper. It was once justified when there were vulnerabilities, but not today.

4. If you can’t get the security policy changed, ask the IT team to expedite Windows 7 deployment as it does not cause NTLM authentication in this scenario.

Ah, those old habits, they always come back on you. The hope you now have tools and countermeasures to make all this more bearable.

Herbert “glue factory” Mauerer


Active Directory Site Topology, Not Just for DCs

$
0
0

Mark here again. Following a recent experience in the field (yes, Premier Field Engineers leave the office), I thought it’d be useful to discuss Active Directory Topology and how it influences DFS Namespace and DFS Folder referrals.

Let’s look at the following environment for the purposes of our discussion

image

Let’s suppose that the desired referral behaviour is for clients to use local DFS targets, then the DFS target in the hub site and finally, any other target.

Lastly, let’s assume the effective referral ordering is configured for “Lowest Cost”.

Note: DFS Namespace and Folder referrals are site-costed by default in Windows Server 2008 or later. The feature is also available in Windows Server 2003 (SP1 or later) but is disabled by default. The use of site-costed referrals is controlled by the following registry value

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Dfs\Parameters]
Value Name: SiteCostedReferrals
Data Type: REG_DWORD
Value: 0 (off) or 1 (on)

What does the referral list look like for clients in each site?

Scenario A

Clients assigned an IP Address in the subnet 192.168.1.0/24, will associate themselves with the site Spoke-A. The DFS referral process will offer them the ordered list

DFS Target A
DFS Target Hub
<random ordering of DFS Target C, DFS Target D and DFS Target E>
DFS Target B

Great! This is pretty much what we’d designed for – the local target first, the hub second and random ordering of equally costed targets after that. The exception is DFS Target B, which cannot be costed without a site-link between site Spoke-B and any other site.

Scenario B

Clients assigned an IP Address in the subnet 192.168.2.0/24 will associate themselves with the site Spoke-B. The DFS referral process will offer them the ordered list

DFS Target B
<random ordering of DFS Target A, DFS Target C, DFS Target D, DFS Target E and DFS Target Hub>

In this scenario, we correctly receive the local DFS target heading the list but the rest of the referral order is random. Without a site-link from the site Spoke-B to any other site, DFS cannot calculate the cost of targets beyond the client site.

Scenario C

Clients assigned an IP Address in the subnet 192.168.3.0/24 do not associate with any site. The pattern seen in the site diagram above suggests the client should associate with the site Spoke-C but the missing subnet definition leaves clients in limbo. In fact, nltest.exe will show you this

image

The DFS referral process will offer clients the ordered list

<random ordering of DFS Target A, DFS Target B, DFS Target C, DFS Target D, DFS Target E and DFS Target Hub>

Completely random – a long way from the design goal.

Scenario D/E

Clients assigned an IP Address in the subnet 192.168.4.0/24 will associate themselves with the site Spoke-D. The DFS referral process will offer them the ordered list

DFS Target D
<random ordering of DFS Target E and DFS Target Hub>
<random ordering of DFS Target A and DFS Target C>
DFS Target B

Here the local target is ordered first. DFS Target E and DFS Target Hub have the same cost and are ordered next in random order. This is because sites Hub, Spoke-D and Spoke-E are all linked with the same site-link and therefore have the same cost.

DFS Target A and DFS Target C are offered in random order following DFS Target E and DFS Target Hub – again because they have the same cost. Lastly, the un-costed DFS Target B is ordered.

Clients assigned an IP Address in the subnet 192.168.5.0/24 will associate themselves with the site Spoke-E and experience the same referral order as clients in site Spoke-D except the position of DFS Target D and DFS Target E in the referral order will be swapped.

This is close to the design goal but the site-link connecting three sites may cause DFS Target Hub to appear slightly out of order.

Conclusion

I’ve seen many poorly managed Active Directory topologies – most often when Domain Controllers reside in a central site. A well-defined topology is important for other reasons than DC replication/location – DFS referrals being a big one. Without properly defined site-links, subnets and site-to-subnet mappings, users may find themselves unwittingly directed to a file server in Mordor.

- Mark “AD Site Topology is Precious” Renoden.

Windows 8 for the IT Pro: The New Plumbing

$
0
0

Hi folks, Ned coming to you from the secret underground redoubt, where the cable is out, the wife is at grad school, and the dogs are napping as autumn finally reaches North Carolina.

image

I’m not a fan of blog posts that only aggregate links and don’t offer original thought. Today I make an exception, as the first official bits of Windows 8 have hit the street. Like all Windows pre-releases, you notice two immediate problems:

  1. The consumer content overwhelms the IT Professional content.
  2. The Internet is a public toilet of misunderstanding, opinions masquerading as facts, and general ignorance.

Nothing wrong with the first point; we’re outnumbered at least a thousand to one, so it’s natural for advertising to target the majority. The second point I can’t abide by; I despise misinformation.

Nothing has changed with my NDA - I cannot discuss Windows 8 in detail, speak of the future, or otherwise get myself fired. Nevertheless, I can point you to accurate content that’s useful to an IT Professional craving more than just the new touchscreen shell for tablets. My links talk a little Windows Server and show features that Mom won’t be using.

So, in vague order and with no regard to the features being Directory Services or not, here are the goods. Some are movies and PowerPoint slides, some are text. Some are Microsoft and some are not. Many are buried in the //Build site. I added some exposition to each link so I don’t feel so dirty.

Enjoy, it’s going to be a busy decade.

Intro (good for basic familiarity)

Security & Active Directory

Interestingly, no mainstream websites have discovered many of the AD changes visible in the server preview build, or at least, not written about them. Aha! Here they come, thanks for the tip Sean:

Virtualization, Networking, & High Availability

Deployment & Performance

Remember, everything is subject to change and refers only to the Developer Preview release from the //Build conference; Windows 8 isn’t even in beta yet. Grab the client or server and see for yourself.

And no matter what link you click, I don’t recommend reading the comments. See point 2.

image
Where do you want me to put this Internet?

Ned “bowl o’ links” Pyle

What the heck does /genmigxml do?

$
0
0

Hello guys and gals, Kim Nichols here with my first AskDS post. While deciding on a title, I did a quick search on the word "heck" on our AskDS blog to see if Ned was going to give me any grief. Apparently, we "heck" a lot around here, so I guess it's all good. :-)

I'm hoping to shed some light on USMT's /genmigxml switch and uncover the truth behind which XML files must be included for both scanstate and loadstate. I recently had a USMT 4 case where the customer was using the /genmigxml switch during scanstate to generate a custom XML file, mymig.xml. After creating the custom XML, the file was added to the scanstate command via the /i:mymigxml switch along with any other custom XML files. When referencing the file again on loadstate, loadstate failed with errors similar to the following:

2011-08-01 18:40:50, Info  [0x080000] Current XML stack: <component type="Documents" context="KIMN\test2" defaultSupported="Yes"> "External_UserDocs - KIMN\test2"

2011-08-01 18:40:50, Error [0x08055d] MXE Agent: Migration XML C:\USMT\amd64\mig.xml is not properly formatted. Message: context attribute has an invalid value.

 

2011-08-01 18:40:50, Error [0x000000] EngineStartup caught exception: FormatException: context attribute has an invalid value. class UnBCL::ArrayList<class Mig::CMXEXmlComponent *> *__cdecl Mig::CMXEMigrationXml::LoadComponents(class Mig::CPlatform *,class UnBCL::String *,class UnBCL::XmlNode *,class Mig::CMXEMigrationXml *,class Mig::CMXEXmlComponent *,class Mig::CUserContext *)

 

2011-08-01 18:40:50, Info [0x080000] COutOfProcPluginFactory::FreeSurrogateHost: Shutdown in progress

From this error, it appears that user KIMN\test2 is invalid for some reason. What is interesting is if that user logs on to the computer prior to running loadstate, loadstate completes successfully. Requiring all users to log on prior to migrating their data is not recommended and can cause issues with application migration.

I did some research to get a better understanding of the purpose behind the /genmigxml switch and why we hadn't received more calls on this issue. Here's what I found:

Technet: What's New in USMT 4.0 - http://technet.microsoft.com/en-us/library/dd560752(WS.10).aspx

This option specifies that the ScanState command should use the document finder to create and export an .xml file that defines how to migrate all of the files found on the computer on which the ScanState command is running. The document finder, or MigXmlHelper.GenerateDocPatterns helper function, can be used to automatically find user documents on a computer without authoring extensive custom migration .xml files.”

Technet : Best Practices - http://technet.microsoft.com/en-us/library/dd560764(WS.10).aspx

“You can Utilize the /genmigxml command-line option to determine which files will be included in your migration, and to determine if any modifications are necessary.”

Technet: Step-by-Step: Basic Windows Migration using USMT for IT Professionals - http://technet.microsoft.com/en-us/library/dd883247(WS.10).aspx

"In USMT 4.0, the MigXmlHelper.GenerateDocPatterns function can be used to automatically find user documents on a computer without authoring extensive custom migration .xml files. This function is included in the MigDocs.xml sample file downloaded with the Windows AIK. "

We can use /genmigxml to get an idea of what the migdocs.xml file is going to collect for a specific user. We don't specifically document what you should do with the generated XML besides review it. Logic might lead us to believe that, similar to the /genconfig switch, we should generate this XML file and include it on both our scanstate and our loadstate operations if we want to make modifications to which data is gathered for a specific user. This is where we run into the issue above, though.

If we take a look inside this XML file, we see a list of locations from which scanstate will collect documents. This list includes the path for each user profile on the computer. Here's a section from mymigxml.xml in my test environment. Notice that this is the same user from my loadstate log file above.

clip_image002

So, if including this file generates errors, why use it? The answer is /genmigxml was only intended to provide a sample of what will be migrated using the standard XML files. The XML is machine-specific and not generalized for use on multiple computers. If you need to alter the default behavior of migdocs.xml to exclude or include files/folders for specific users on a specific computer, modify the file generated via /genmigxml for use with scanstate. This file contains user-specific profile paths so don't include it with loadstate.

But wait… I thought all XML files had to be included in both scanstate and loadstate?

The actual answer is it depends. In the USMT 4.0 FAQ, we specify including the same XML files for both scanstate and loadstate. However, immediately following that sentence, we state that you don't have to include the Config.xml on loadstate unless you want to exclude some files that were migrated to the store.

The more complete answer is the default XML files (migapps.xml & migdocs.xml) need to be included in both scanstate and loadstate if you want any of the rerouting rules to apply; for instance, migrating from one version of Office to another. Because migapps.xml & migdocs.xml transform OS and user data to be compatible with a different version of the OS/Office, you must include both files on scanstate and on loadstate.

As for your custom XML files (aside from the one generated from /genmigxml), these only need to be specified in loadstate if you are rerouting files or excluding files that were migrated to the store from migrating down to the new computer during loadstate.

To wrap this up, in most cases migdocs.xml migrates everything you need. If you are curious about what will be collected you can run /genmigxml to find out, but the output is computer-specific, you can’t use it without modification.

- Kim "Boilermaker" Nichols

The PDCe with too much to do

$
0
0

Hi. Mark again. As part of my role in Premier Field Engineering, I’m sometimes called upon to visit customers when they have a critical issue being worked by CTS, needing another set of eyes. For today’s discussion, I’m going to talk you through, one such visit.

It was a dark and stormy night …

Well not really – it was mid-afternoon but these sorts of things always have that sense of drama.

The Problem

Custom applications were hard coded to use the PDC Emulator (PDCe) for authentication – a strategy the customer later abandoned to eliminate a single point of failure. The issue was hot because the PDCe was not processing authentication requests after a reboot.

The customer had noticed lsass.exe consuming a lot of CPU and this is where CTS were focusing their efforts.

The Investigation

Starting with the Directory Service event logs, I noticed the following:

Event Type:          Information

Event Source:        NTDS Replication

Event Category:      Replication

Event ID:            1555

Date:                <Date>

Time:                <Time>

User:                NT AUTHORITY\ANONYMOUS LOGON

Computer:            <Name of PDCe>

Description:

The local domain controller will not be advertised by the domain controller locator service as an available domain controller until it has completed an initial synchronization of each writeable directory partition that it holds. At this time, these initial synchronizations have not been completed.

 

The synchronizations will continue.

 

also:

Event Type:          Warning

Event Source:        NTDS Replication

Event Category:      Replication

Event ID:            2094

Date:                <Date>

Time:                <Time>

User:                NT AUTHORITY\ANONYMOUS LOGON

Computer:            <Name of PDCe>

Description:

Performance warning: replication was delayed while applying changes to the following object. If this message occurs frequently, it indicates that the replication is occurring slowly and that the server may have difficulty keeping up with changes.

Object DN: CN=<ClientName>,OU=Workstations,OU=Machine Accounts,DC=<Domain Name>,DC=com

 

Object GUID: <GUID>

 

Partition DN: DC=<Domain Name>,DC=com

 

Server: <_msdcs DNS record of replication partner>

 

Elapsed Time (secs): 440

 

 

User Action

 

A common reason for seeing this delay is that this object is especially large, either in the size of its values, or in the number of values. You should first consider whether the application can be changed to reduce the amount of data stored on the object, or the number of values.  If this is a large group or distribution list, you might consider raising the forest version to Windows Server 2003, since this will enable replication to work more efficiently. You should evaluate whether the server platform provides sufficient performance in terms of memory and processing power. Finally, you may want to consider tuning the Active Directory database by moving the database and logs to separate disk partitions.

 

If you wish to change the warning limit, the registry key is included below. A value of zero will disable the check.

 

Additional Data

 

Warning Limit (secs): 10

 

Limit Registry Key: System\CurrentControlSet\Services\NTDS\Parameters\Replicator maximum wait for update object (secs)

 

 

and:

Event Type:          Warning

Event Source:        NTDS General

Event Category:      Replication

Event ID:            1079

Date:                <Date>

Time:                <Time>

User:                <SID>

Computer:            <Name of PDCe>

Description:

Internal event: Active Directory could not allocate enough memory to process replication tasks. Replication might be affected until more memory is available.

 

User Action

Increase the amount of physical memory or virtual memory and restart this domain controller.

 

 

In summary, the PDCe hasn’t completed initial synchronisation after a reboot and it’s having memory allocation problems while it works on sorting it out. Initial synchronisation is discussed in:

Initial synchronization requirements for Windows 2000 Server and Windows Server 2003 operations master role holders
http://support.microsoft.com/kb/305476

With this information in hand, I had a chat with the customer hoping we’d identify a relevant change in the environment leading up to the outage. It became apparent they’d configured a policy for deploying RDP session certificates. Furthermore, they’d noticed clients receiving many of these certificates instead of the expected one.

RDP session certificates are Secure Sockets Layer (SSL) certificates issued to Remote Desktop servers. It is also possible to deploy RDP session certificates to client operating systems such as Windows Vista and Windows 7. More on this later…

The customer and I examined a sample client and found 285 certificates! In addition to this unusual behaviour, the certificates were being published to Active Directory. There were 3700 affected clients – approx. 1 million certificates published to AD!

The Story So Far

We’ve injected huge amounts of certificate data into the userCertificate attribute of computer objects, we’ve got replication backlog due to memory allocation issues and the DC can’t complete an initial sync before advertising itself as a DC.

What Happened Next Uncle Mark?!

The CTS engineer back at home base wanted to gather some debug logging of LSASS.exe. While attempting to gather such a log, the PDCe became completely unresponsive and we had to reboot.

While the PDCe rebooted, the customer disabled the policy responsible for deploying RDP session certificates.

After the reboot, the PDCe had stopped logging event 1079 (for memory allocation failures) but in addition to event 1555 and 2094, we were now seeing:

Event Type           Warning

Event Source:        NTDS Replication

Event Category:      DS RPC Client

Event ID:            1188

Date:                <Date>

Time:                <Time>

User:                NT AUTHORITY\ANONYMOUS LOGON

Computer:            <Name of PDCe >

Description:

A thread in Active Directory is waiting for the completion of a RPC made to the following domain controller.

 

Domain controller:

<_msdcs DNS record of replication partner>

Operation:

get changes

Thread ID:

<Thread ID>

Timeout period (minutes):

5

 

Active Directory has attempted to cancel the call and recover this thread.

 

User Action

If this condition continues, restart the domain controller.

 

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

A bit more investigation with:

Repadmin.exe /showreps (or /showrepl for later versions of repadmin)

told us that all partitions were in sync except the domain partition – the partition with a million certificates attached to computer objects.

We decided to execute:

Repadmin.exe /replicate <Name of PDCe> <Closest Replication Partner> <Domain Naming Context> /force

Next, we waited … for several hours.

While waiting, we considered:

  • Disabling initial sync with:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters]

Repl Perform Initial Synchronizations = 0

  • Increasing the RPC timeout for NTDS with:

http://support.microsoft.com/default.aspx?scid=kb;EN-US;830746

Both of these changes require a reboot. The customer was hesitant to reboot again and while they thought it over, initial sync completed.

With the PDCe authenticating clients, I headed home to get some sleep. The customer had disabled the RDP session certificate deployment policy and was busy clearing the certificate data out of computer objects in Active Directory.

Why?

The next day, I went looking for root cause. The customer had followed some guidance to deploy the RDP session certificates. Some of the guidance noted during the investigation is posted here:

http://blogs.msdn.com/b/rds/archive/2010/04/09/configuring-remote-desktop-certificates.aspx

I set up a test environment and walked through the guidance. After doing so, I did not experience the issue. I was getting a single certificate no matter how often I would reboot or apply Group Policy. In addition, RDP session certificates were not being published in Active Directory. Publishing in Active Directory is easily explained by this checkbox:

image

An examination of the certificate template confirmed they had this checked.

So why were clients in the customer environment receiving multiple certificates while clients in my test environment received just one?

The Win

I noticed the following point in the guidance being followed by the customer:

image

A bit of an odd recommendation. Sure enough, the customer’s template had different names for “Template display name” and “Template name”. I changed my test environment to make the same mistake and suddenly I had a repro – a new certificate on every reboot and policy refresh.

Some research revealed that this was a known issue. One of these fields checks whether an RDP session certificate exists while the other field obtains a new certificate. Giving both fields the same name works around the problem.

Conclusion

So in the aftermath of this incident, there are some general recommendation that anyone can take to help avoid this kind of situation.

  • Follow our guidance carefully – even the weird stuff
  • Test before you deploy
  • Deploy the same way as you test
  • Avoid making critical servers more critical than they need to be

- Mark “Falkor” Renoden

Advanced XML filtering in the Windows Event Viewer

$
0
0

Hi guys, Joji Oshima here again. Today I want to talk about using Custom Views in the Windows Event Viewer to filter events more effectively. The standard GUI allows some basic filtering, but you have the ability to drill down further to get the most relevant data.
Starting in Windows Vista/2008, you have the ability to modify the XML query used to generate Custom Views.

Limitations of basic filtering:

Basic filtering allows you to display events that meet certain criteria. You can filter by the event level, the source of the event, the Event ID, certain keywords, and the originating user/computer.

image
Basic Filter for Event 4663 of the security event logs

You can choose multiple events that match your criteria as well.

image
Basic filter for Event 4660 & 4663 of the security event logs

A real limitation to this type of filtering is the data inside each event can be very different. 4663 events appear when auditing users accessing objects. You can see the account of the user, and what object they were accessing.

clip_image001 clip_image002
Sample 4663 events for users ‘test5’ and ‘test9’

If you want to see events that are only about user ‘test9’, you need a Custom View and an XML filter.

Using XML filtering and Custom Views:

Custom Views using XML filtering are a powerful way to drill through event logs and only display the information you need. With Custom Views, you can filter on data in the event. To create a Custom View based on the username, right click Custom Views in the Event Viewer and choose Create Custom View.

image

Click the XML Tab, and check Edit query manually. Click ok to the warning popup. In this window, you can type an XML query. For this example, we want to filter by SubjectUserName, so the XML query is:

      <QueryList>
           <Query Id="0">
              <Select Path="Security">
                 *[EventData[Data[@Name='SubjectUserName'] and (Data='test9')]]
               </Select>
           </Query>
      </QueryList>

image

After you type in your query, click the Ok button. A new window will ask for a Name & Description for the Custom View. Add a descriptive name and click the Ok button.

image

You now have a Custom View for any security events that involve the user test9.

image

Take It One Step Further:

Now that we’ve gone over a simple example, let’s look at the query we are building and what else we can do with it. Using XML, we are building a SELECT statement to pull events that meet the criteria we specify. Using the standard AND/OR Boolean operators, we can expand upon the simple example to pull more events or to refine the list.

Perhaps you want to monitor two users - test5 and test9 - for any security events. Inside the search query, we can use the Boolean OR operator to include users that have the name test5 or test9.

The query below searches for any security events that include test5 or test9.

      <QueryList>
           <Query Id="0">
              <Select Path="Security">
                 *[EventData[Data[@Name='SubjectUserName'] and (Data='test5' or Data=’test9’)]]
               </Select>
           </Query>
      </QueryList>

Event Metadata:

At this point you may be asking, where did you come up with SubjectUserName and what else can I filter on? The easiest way to find this data is to find a specific event, click on the details tab, and then click the XML View radio button.

image

From this window, we can see the structure of the Event’s XML metadata. This event has a <System> tag and an <EventData> tag. Each of these data names can be used in the filter and combined using standard Boolean operators.

With the same view, we can examine the <System> metadata to find additional data names for filtering.

image

Now let’s say we are only interested in a specific Event ID involving either of these users. We can incorporate an AND Boolean to filter on the System data.

The query below looks for 4663 events for user test5 or test9.

      <QueryList>
           <Query Id="0">
              <Select Path="Security">
                 *[EventData[Data[@Name='SubjectUserName'] and (Data='test5' or Data='test9')]]
                 and
                 *[System[(EventID='4663')]]
               </Select>
           </Query>
      </QueryList>

Broader Filtering:

Say you wanted to filter on events involving test5 but were unsure if it would be in SubjectUserName, TargetUserName, or somewhere else. You don’t need to specify the specific name that the data can be in, but just search that some data in <EventData> contains test5.

The query below looks for events that any data in <EventData> equals test5.

      <QueryList>
           <Query Id="0">
              <Select Path="Security">
                 *[EventData[Data and (Data='test5')]]
              </Select>
           </Query>
      </QueryList>

Multiple Select Statements:

You can also have multiple select statements in your query to pull different data in the same log or data in another log. You can specify which log to pull from inside the <select> tag, and have multiple <select> tags in the same <query> tag.

The example below will pull 4663 events from the security event log and 1704 events from the application event log.

      <QueryList>
           <Query Id="0">
              <Select Path="Security">*[System[(EventID='4663')]]</Select>
             <Select Path="Application">*[System[(EventID='1704')]]</Select>
           </Query>
      </QueryList>

image

XPath 1.0 Limitations:

Windows Event Log supports a subset of XPath 1.0. There are limitations to what functions work in the query. For instance, you can use the "position", "Band", and "timediff" functions within the query but other functions like "starts-with" and "contains" are not currently supported.

Further Reading:

Create a Custom View
http://technet.microsoft.com/en-us/library/cc709635.aspx

Event Queries and Event XML
http://msdn.microsoft.com/en-us/library/bb399427(v=VS.90).aspx

Consuming Events (Windows)
http://msdn.microsoft.com/en-us/library/dd996910(VS.85).aspx

Conclusion:

Using Custom Views in the Windows Event Log can be a powerful tool to quickly access relevant information on your system. XPath 1.0 has a learning curve but once you get a handle on the syntax, you will be able to write targeted Custom Views.

Joji "the sieve" Oshima

[Check out pseventlogwatcher if you want to combine complex filters with monitoring and automation. It’s made by AskDS superfan Steve Grinker: http://pseventlogwatcher.codeplex.com/ – Neditor]

Friday Mail Sack: Super Slo-Mo Edition

$
0
0

Hello folks, Ned here again with another Mail Sack. Before I get rolling though, a quick public service announcement:

Plenty of you have downloaded the Windows 8 Developer Preview and are knee-deep in the new goo. We really want your feedback, so if you have comments, please use one of the following avenues:

I recommend sticking to IT Pro features; the consumer side’s covered and the biggest value is your Administrator experience. The NDA is not off - I still cannot comment on the future of Windows 8 or tell you if we already have plans to do X with Y. This is a one-way channel from you to us (to the developers).

Cool? On to the sack. This week we discuss:

Shake it.

Question

We were chatting here about password synchronization tools that capture password changes on a DC and send the clear text password to some third party app. I consider that a security risk...but then someone asked me how the password is transmitted between a domain member workstation and a domain controller when the user performs a normal password change operation (CTRL+ALT+DEL and Change Password). I suppose the client uses some RPC connection, but it would be great if you could point me to a reference.

Answer

Windows can change passwords many ways - it depends on the OS and the component in question.
Update: Clarified some wording, thanks Steve and JS!

1. For the specific case of using CTRL+ALT+DEL because your password has expired or you just felt like changing your password:

If you are using a modern OS like Windows 7 with AD, the computer uses the Kerberos protocol end to end. This starts with a normal AS_REQ logon, but to a special service principal name of kadmin/changepw, as described in http://www.ietf.org/rfc/rfc3244.txt.

The computer first contacts a KDC over port 88, then communicates over port 464 to send along the special AP_REQ and AP_REP. You are still using Kerberos cryptography and sending an encrypted payload containing a KRB_PRIV message with the password. Therefore, to get to the password, you have to defeat Kerberos cryptography itself, which means defeating the crypto and defeating the key derived from the cryptographic hash of the user's original password. Which has never happened in the history of Kerberos.

image

The parsing of this kpasswd traffic is currently broken in NetMon's latest public parsers, but even when you parse it in WireShark, all you can see is the encryption type and a payload of encrypted goo. For example, here is that Windows 7 client talking to a Windows Server 2008 R2 DC, which means AES-256:

image
Aka: Insane-O-Cryption ™

On the other hand, if using a crusty OS like Windows XP, you end up using a legacy password mechanism that worked with NT 4.0 – in this case SamrUnicodeChangePasswordUser2 (http://msdn.microsoft.com/en-us/library/cc245708(v=PROT.10).aspx).

XP also supports the Kerberos change mechanism, but by default uses NTLM with CTRL+ALT+DEL password changes. Witness:

image

This uses “RPC over SMB with Named Pipes” with RPC packet privacy. You are using NTLM v2 by default (unless you set LMCompatibility unwisely) and you are still double-protected (the payload and packets), which makes it relatively safe. Definitely not as safe as Win7 though – just another reason to move forward.

image

You can disable NTLM in the domain if you have Win2008 R2 DCs and XP is smart enough to switch to using Kerberos here:

image

... but you are likely to break many other apps. Better to get rid of Windows XP.

2. A lot of administrative code use SamrSetInformationUser2, which does not require knowing the user’s current password (http://msdn.microsoft.com/en-us/library/cc245793(v=PROT.10).aspx). For example, when you use NET USER to change a domain user’s password:

image

This invokes SamrSetInformationUser2 to set Internal4InformationNew data:

image

So, doubly-protected (a cryptographically generated, key signed hash covered by an encrypted payload). This is also “RPC over SMB using Named Pipes”

image

The crypto for the encrypted payload is derived from a key signed using the underlying authentication protocol, seen from a previous session setup frame (negotiated as Kerberos in this case):

image

3. The legacy mechanisms to change a user password are NetUserChangePassword (http://msdn.microsoft.com/en-us/library/windows/desktop/aa370650(v=vs.85).aspx) and IADsUser::ChangePassword (http://msdn.microsoft.com/en-us/library/windows/desktop/aa746341(v=vs.85).aspx)

4. A local user password change usually involves SamrUnicodeChangePasswordUser2, SamrChangePasswordUser, or SamrOemChangePasswordUser2 (http://msdn.microsoft.com/en-us/library/cc245705(v=PROT.10).aspx).

There are other ways but those are mostly corner-case.

Note: In my examples, I am using the most up to date Netmon 3.4 parsers from http://nmparsers.codeplex.com/.

Question

If I try to remove the AD Domain Services role using ServerManager.msc, it blocks me with this message:

image

But if I remove the role using Dism.exe, it lets me continue:

image

This completely hoses the DC and it no longer boots normally. Is this a bug?

And - hypothetically speaking, of course - how would I fix this DC?

Answer

Don’t do that. :)

Not a bug, this is expected behavior. Dism.exe is a pure servicing tool; it knows nothing more of DCs than the Format command does. ServerManager and servermanagercmd.exe are the tools that know what they are doing.
Update: Although as Artem points out in the comments, we want you to use the Server Manager PowerShell and not servermanagercmd, which is on its way out.

To fix your server, pick one:

  • Boot it into DS Repair Mode with F8 and restore your system state non-authoritatively from backup (you can also perform a bare metal restore if you have that capability - no functional difference in this case). If you do not have a backup and this is your only DC, update your résumé.
  • Boot it into DS Repair Mode with F8 and use dcpromo /forceremoval to finish what you started. Then perform metadata cleanup. Then go stand in the corner and think about what you did, young man!

Question

We are getting Event ID 4740s (account lockout) for the AD Guest account throughout the day, which is raising alerts in our audit system. The Guest account is disabled, expired, and even renamed. Yet various clients keep locking out the account and creating the 4740 event. I believe I've traced it back to the occasional attempt of a local account attempting to authenticate to the domain. Any thoughts?

Answer

You'll see that when someone has set a complex password on the Guest account, using NET USER for example, rather than having it be the null default. The clients never know what the guest password is, they always assume it's null like default - so if you set a password on it, they will fail. Fail enough and you lock out (unless you turn that policy off and replace it with intrusion protection detection and two-factor auth). Set it back to null and you should be ok. As you suspected, there a number of times when Guest is used as part of a "well, let's try that" algorithm:

Network access validation algorithms and examples for Windows Server 2003, Windows XP, and Windows 2000

To set it back you just use the Reset Password menu in Dsa.msc on the guest account, making sure not to set a password and clicking ok. You may have to adjust your domain password policy temporarily to allow this.

As for why it's "locking out" even though it's disabled and renamed:

  • It has a well-known SID (S-1-5-21-domain-501) so renaming doesn’t really do anything except tick a checkbox on some auditor's clipboard
  • Disabled accounts can still lock out if you keep sending bad passwords to them. Usually no one notices though, and most people are more concerned about the "account is disabled" message they see first.

Question

What are the steps to change the "User Account" password set when the Network Device Enrollment Service (NDES) is installed?

Answer

When you first install the Network Device Enrollment Service (NDES), you have the option of setting the identity under which the application pool runs to the default application pool identity or to a specific user account. I assume that you selected the latter. The process to change the password for this user account requires two steps -- with 27 parts (not really…).

  1. First, you must reset the user account's password in Active Directory Users and Computers.

  2. Next, you must change the password configured in the application pool Advanced Settings on the NDES server.

a. In IIS manager, expand the server name node.

b. Click on Application Pools.

c. On the right, locate and highlight the SCEP application pool.

image

d. In the Action pane on the right, click on Advanced Settings....

e. Under Process Model click on Identity, then click on the … button.

image

f. In the Application Pool Identity dialog box, select Custom account and then click on Set….

g. Enter the custom application pool account name, and then set and confirm the password. Click Ok, when finished.

image

h. Click Ok, and then click Ok again.

i. Back on the Application Pools page, verify that SCEP is still highlighted. In the Action pane on the right, click on Recycle….

j. You are done.

Normally, you would have to be concerned with simply resetting the password for any service account to which any digital certificates have been assigned. This is because resetting the password can result in the account losing access to the private keys associated with those certificates. In the case of NDES, however, the certificates used by the NDES service are actually stored in the local computer's Personal store and the custom application pool identity only has read access to those keys. Resetting the password of the custom application pool account will have no impact on the master key used to protect the NDES private keys.

[Courtesy of Jonathan, naturally - Neditor]

Question

If I have only one domain in my forest, do I need a Global Catalog? Plenty of documents imply this is the case.

Answer

All those documents saying "multi-domain only" are mistaken. You need GCs - even in a single-domain forest - for the following:

(Update: Correction on single-domain forest logon made, thanks for catching that Yusuf! I also added a few more breakage scenarios)

  • Perversely, if you have enabled IgnoreGCFailures (http://support.microsoft.com/kb/241789); turning it on removes universal groups from the user security token, meaning they will logon but not be able to access resources they accessed fine previously).
  • If your users logon with UPNs and try to change their password (they can still logon in a single domain forest with UPN or NetBiosDomain\SamAccountName style logons).
  • Even if you use Universal Group Membership Caching to avoid the need for a GC in a site, that DC needs a GC to update the cache.
  • MS Exchange is deployed (All versions of Exchange services won't even start without a GC).
  • Using the built-in Find in the shell to search AD for published shares, published DFS links, published printers, or any object picker dialog that provides option "entire directory"  will fail.
  • DPM agent installation will fail.
  • AD Web Services (aka AD Management Gateway) will fail.
  • CRM searches will fail.
  • Probably other third parties of which I'm not aware.

We stopped recommending that customers use only handfuls of GCs years ago - if you get an ADRAP or call MS support, we will recommend you make all DCs GCs, unless you have an excellent reason not. Our BPA tool states that you should have at least one GC per AD site: http://technet.microsoft.com/en-us/library/dd723676(WS.10).aspx.

Question

If I use DFSR to replicate a folder containing symbolic links, will this replicate the source files or the actual symlinks? The DFSR FAQ says symlink replication is supported under certain circumstances.

Answer

The symlink replicates; however, the underlying data does not replicate just because there is a symlink. If the data is not stored within the RF, you end up with a replicated symlink to nowhere:

Server 1, replicating a folder called c:\unfiltersub. Note how the symlink points to a file that is not in the scope of replication:

image

Server 2, the symlink has replicated - but naturally, it points to an un-replicated file. Boom:

image

If the source data is itself replicated, you’re fine. There’s no real way to guarantee that though, except preventing users from creating files outside the RF by using permissions and FSRM screens. If your end users can only access the data through a share, they are in good shape. I'd imagine they are not the ones creating symlinks though. ;-)

Question

I read your post on career development. There are many memory techniques and I know everyone is different, but what do you use?

[A number of folks asked this question - Neditor]

Answer

When I was younger, it just worked - if I was interested in it, I remembered it. As I get older and burn more brain cells though, I find that my best memory techniques are:

  • Periodic skim and refresh. When I have learned something through deep reading and hands on, I try to skim through core topics at least once a year. For example, I force myself to scan the diagrams in the all the Win2003 Technical Reference A-Z sections, and if I can’t remember what the diagram is saying, I make myself read that section in detail. I don’t let myself get too stale on anything and try to jog it often.
  • Mix up the media. When learning a topic, I read, find illustrations, and watch movies and demos. When there are no illustrations, I use Visio to make them for myself based on reading. When there are no movies, I make myself demo the topics. My brain seems to retain more info when I hit it with different styles on the same subject.
  • I teach and publically write about things a lot. Nothing hones your memory like trying to share info with strangers, as the last thing I want is look like a dope. It makes me prepare and check my work carefully, and that natural repetition – rather than forced “read flash cards”-style repetition, really works for me. My brain runs best under pressure.
  • Your body is not a temple (of Gozer worshipers). Something of a cliché, but I gobble vitamins, eat plenty of brain foods, and work out at least 30 minutes every morning.

I hope this helps and isn’t too general. It’s just what works for me.

Other Stuff

Have $150,000 to spend on a camera, a clever director who likes FPS gaming, and some very fit paint ballers? Go make a movie better than this. Watch it multiple times.

image
Once for the chat log alone

Best all-around coverage of the Frankfurt Auto Show here, thanks to Jalopnik.

image
Want!

The supposedly 10 Coolest Death Scenes in Science Fiction History. But any list not including Hudson’s last moments in Aliens is fail.

If it’s true… holy crap! Ok, maybe it wasn’t true. Wait, HOLY CRAP!

So many awesome things combined.

Finally, my new favorite time waster is Retronaut. How can you not like a website with things like “Celebrities as Russian Generals”.

image
No, really.

Have a nice weekend folks,

- Ned “Oh you want some of this?!?!” Pyle

AD FS 2.0 Claims Rule Language Primer

$
0
0

Hi guys, Joji Oshima here again. On the Directory Services team, we get questions regarding the Claims Rule Language in AD FS 2.0 so I would like to go through some of the basics. I’ve written this article for those who have a solid understanding of Claims-based authentication. If you would like to read up on the fundamentals first, here are some good resources.

An Introduction to Claims
http://msdn.microsoft.com/en-us/library/ff359101.aspx

Security Briefs: Exploring Claims-Based Identity
http://msdn.microsoft.com/en-us/magazine/cc163366.aspx

AD FS 2.0 Content Map
http://social.technet.microsoft.com/wiki/contents/articles/2735.aspx

Claims Rules follow a basic pipeline. The rules define which claims are accepted, processed, and eventually sent to the relying party. You define claims rules as a property of the Claims Provider Trust (incoming) and the Relying Party Trust (outgoing).

image
Basic flowchart for the Claims Pipeline taken from TechNet.

There is also an authorization stage checks if the requestor has access to receive a token for the relying party. You can choose to allow all incoming claims through by setting the Authorization Rules to Permit All. Alternately, you could permit or deny certain users based on their incoming claim set. You can read more about authorization claim rules here and here.

You can create the majority of claims issuance and claims transformations using a Claim Rule Template in AD FS 2.0 Management console, but there are some situations where a custom rule is the only way to get the results you need. For example, if you want to combine values from multiple claims into a single claim, you will need to write a custom rule to accomplish that. To get started, I would recommend creating several rules through the Claim Rule Templates and view the rule language generated. Once you save the template, you can click the View Rule Language button from the Edit Rule window to see how the language works.

image

image

In the screenshot above, the rule translates as follows:

If (there is an incoming claim that matches the type "http://contoso.com/department")

Then (issue a claim with the type "http://adatum.com/department", using the Issuer, Original Issuer, Value, and ValueType of the incoming claim)

The claims "http://contoso.com/department" and "http://adatum.com/department" are URIs. These claims can be in the URN or HTTP format. The HTTP format is NOT a URL and does not have to specifically link to actual content on the Internet or intranet.

Claims Rule Language Syntax:

Typically, the claims rule language is structured similarly to an “if statement” in many programming languages.

If (condition is true)

Then (issue a claim with this value)

What this says is “if a condition is true, issue this claim”. A special operator “=>” separates the condition from the issuance statement and a semicolon ends the statement.

Condition statement => issuance statement;

Review some of the claims you created and look at the structure. See if you can pick out each part. Here is the one we looked at in the first section. Let’s break it down in to the basic parts.

image

The “if statement” condition:

c:[Type == http://contoso.com/department]

The special operator:

=>

The issuance statement:

issue(Type = "http://adatum.com/department", Issuer = c.Issuer, OriginalIssuer = c.OriginalIssuer, Value = c.Value, ValueType = c.ValueType);

For each rule defined, AD FS checks the input claims, evaluates them against the condition, and issues the claim if the condition is true. You probably notice the variable “C” in the syntax. Think of “C” as an incoming claim that you can check conditions against, and use values from it to add to an outgoing claim. In this example, we are checking if there is an incoming claim that has a type that is “http://contoso.com/department”. We also use the values in this claim to assign the value of Issuer, OriginalIssuer, Value, and ValueType to the outgoing claim.

There are exceptions to this that are discussed later (using ADD instead of ISSUE and issuing a claim without a condition statement).

Issue a claim to everyone:

In the Claims Rule Language, the condition part is optional. Therefore, you can choose to issue or add a claim regardless of what claims are incoming. To do this, start with the special operator “=>”.

Syntax:

=> issue(type = "http://contoso.com/partner", value = "Adatum");

This syntax will issue a claim type “http://contoso.com/partner” with a value of “Adatum”

You could set similar rules for each Claims Provider Trust so that the Relying Party (or application) can know where the user came from.

Using a Single Condition:

In this example, we will look at a single condition statement. A basic claim rule checks to see if there is an incoming claim with a certain type and if so, issue a claim.

c:[Type == "http://contoso.com/role"]
 => issue(claim = c);

This syntax will check to see if there is an incoming claim with the type “http://contoso.com/role” and, if so, issue the exact same claim going out.

You can create this claim rule using the GUI. Choose the template named “Pass Through or Filter an Incoming Claim” and choose the appropriate incoming claim type.

image
Screenshot: Entries for a simple pass through claim.

You may also check for multiple values within your condition statement. For example, you can check and see if there is an incoming claim with a specific value. In the following example, we will check for an incoming claim with the type “http://contoso.com/role” that has the value of “Editors” and, if so, issue the exact same claim.

c:[Type == "http://contoso.com/role", Value=="Editors"]
 => issue(claim = c);

You can create this claim rule using the GUI as well. Choose “Pass Through or Filter an Incoming Claim”, choose the appropriate incoming claim type, select “Pass though only a specific claim value”, then enter the appropriate value.

image
Screenshot: Entries to pass through the Role claim if the value is “Editors”

Using Multiple Conditions:

Say you want to issue a claim only if the user has an Editor and has an Email claim and, if so, issue the Editor Role claim. To have multiple conditions, we will use multiple “C” variables. We will join the two condition statements with the special operator “&&”.

c1:[Type == "http://contoso.com/role", Value=="Editors"] &&
c2:[Type == "http://contoso.com/email"]
 => issue(claim = c1);

The first condition (c1) checks to see if you have an incoming role claim with the value of Editors. The second condition (c2) checks to see if there is an incoming email claim. If both conditions are met, it will issue an outgoing claim identical to the incoming c1 claim.

Combining Claim Values:

Say you want to join information together from multiple incoming claims to form a single outgoing claim. The following example will check for an incoming claim type of "http://contoso.com/location" and “http://contoso.com/role”. If it has both, it will issue a new claim, “http://contoso.com/targeted”, combining the two values.

c1:[Type == "http://contoso.com/location"] &&
c2:[Type == "http://contoso.com/role"]
 => issue(Type="http://contoso.com/targeted", Value=c1.value+" "+c2.value);

The resulting value is the value of the first claim (c1), plus a space, plus the value of the second claim (c2). You can combine static strings with the values of the claims using the special operator “+”. The example below shows a sample set of incoming claims, and the resulting output claim.

Example Incoming Claims:
"http://contoso.com/location" is "Seattle"
"http://contoso.com/role" is "Editor"

Example Outgoing Claim:
"http://contoso.com/targeted" is "Seattle Editor"

Using ADD instead of ISSUE:

As mentioned in an earlier section, you can ADD a claim instead of ISSUE a claim. You may be wondering what the difference between these two statements are. Using the ADD command instead of the ISSUE command will add a claim to the incoming claim set. This will not add the claim to the outgoing token. Use this for adding placeholder data to use in subsequent claims rules.

image

This illustration was taken from a TechNet article. Here you can see that the first rule adds a role claim with the value of Editor. It then uses this newly added claim to create a greeting claim. Assuming these are the only two rules, the outgoing token will only have a greeting claim, not a role claim.

I’ve outlined another example below.

Sample Rule 1:

c:[Type == "http://contoso.com/location", Value=="NYC"]
 => add(Type = "http://contoso.com/region", Value = "East");

Sample Rule 2:

c:[Type == "http://contoso.com/location", Value=="LAX"]
 => add(Type = "http://contoso.com/region", Value = "West");

Sample Rule 3:

c1:[Type == "http://contoso.com/location"] &&
c2:[Type == "http://contoso.com/region"]
 => issue(Type="http://contoso.com/area", Value=c1.value+" "+c2.value);

In this example, we have two rules that ADD claims to the incoming claim set, and one that issues a claim to the outgoing claim set. This will add a region claim to the incoming claim set and use that to create combine the values to create an area claim. The ADD functionality is very useful with the next section for aggregate functions.

Using aggregate functions (EXISTS and NOT EXISTS):

Using aggregate functions, you can issue or add a single output claim instead of getting an output claim for each match. The aggregate functions in the Claims Rule Language are EXISTS and NOT EXISTS.

Say we want to use the location claim, but not all users have it. Using NOT EXISTS, we can add a universal location claim if the user does not have one.

In Sample Rule 1, we will add a location claim with the value of “Unknown” if the user does not have a location claim. In Sample Rule 2, we will use that value to generate the “http://contoso.com/targeted” claim.

Sample Rule 1:

NOT EXISTS([Type == "http://contoso.com/location"])
 => add(Type = "http://contoso.com/location", Value = "Unknown");

Sample Rule 2:

c1:[Type == "http://contoso.com/location"] &&
c2:[Type == "http://contoso.com/role"]
 => issue(Type="http://contoso.com/targeted", Value=c1.value+" "+c2.value);

This way, users without the "http://contoso.com/location" claim can still get the "http://contoso.com/targeted" claim.

Claims Rule Language, beyond this post:

There is more you can do with the Claims Rule Language that goes beyond the scope of this blog post. If you would like to dig deeper by using Custom Attribute Stores and using Regular Expressions in the language, I’ve put up a TechNet Wiki article that contains these advanced topics and other sample syntax. In addition, some other articles may help with these topics.

Understanding Claim Rule Language in AD FS 2.0:
http://social.technet.microsoft.com/wiki/contents/articles/4792.aspx

When to Use a Custom Claim Rule:
http://technet.microsoft.com/en-us/library/ee913558(WS.10).aspx

The Role of the Claim Rule Language:
http://technet.microsoft.com/en-us/library/dd807118(WS.10).aspx

The Role of the Claims Engine:
http://technet.microsoft.com/en-us/library/ee913582(WS.10).aspx

The Role of the Claims Pipeline:
http://technet.microsoft.com/en-us/library/ee913585(WS.10).aspx

Conclusion:

Creating custom rules with the Claims Rule Language gives you more flexibility over the standard templates. Syntax familiarization takes a while, but with some practice, you should be able to write custom rules in no time. Start by writing custom rules instead of using the templates in your lab environment and build on those.

- Joji “small claims court” Oshima


Oh man, I seriously overslept

$
0
0

Hi folks, Ned here again. We haven’t posted anything in weeks here, and I apologize for that; a perfect storm (of busy) happened. I’ll have a mail sack tomorrow and in the meantime, here’s our old pal Mark with a DFSN article that shares a really slick technique. Enjoy.

- Ned “excuses excuses” Pyle

DFS Override Referral Ordering, Messing with the Natural Order

$
0
0

Hi everyone. This is your friendly (debatable) PFE, Mark Renoden again. Today I’m talking about DFS Override Referral Ordering – a seldom-used feature with an interesting benefit. For the purpose of this discussion, I’ll refer to the following Active Directory Site Diagram:

image

For the entire discussion, let’s suppose our client is in the site Spoke-A and is accessing a DFS Folder target

Referral Configuration

DFS Namespaces allow three options for the ordering of referrals. These are available in the Properties page of the namespace (and optionally overridden using the properties page of individual DFS folders)

image

Note the highlighted text. DFS referrals list DFS targets in the same site as the client, first. As an administrator, you have the option to return DFS targets outside the client’s site in random order, by lowest cost or not at all. The last option – Exclude targets outside of the client’s site – is also known as INSITE.

To see an explanation of Active Directory Site Topology impact on DFS referrals, look here.

I’m going to ignore “Random order” in this discussion. It’s self-explanatory – your DFS referral list for targets outside the client site is random (huh, I just explained it).

Referrals Returned by Lowest Cost

Assume I’ve configured my namespace to return referrals by lowest cost. For my client residing in site Spoke-A, the DFS referral process will offer the ordered list:

DFS Target A
DFS Target Hub
<random ordering of DFS Target B and DFS Target C>
<random ordering of DFS Target D and DFS Target E>

Site Hub has a total cost of 100 from site Spoke-A and is listed first in the out-of-site order.

Sites Spoke-B and Spoke-C have a total cost of 200 from site Spoke-A and are randomly listed next.

Sites Spoke-D and Spoke-E have a total cost of 250 from site Spoke-A and are randomly listed last.

DFS Override Referral Ordering

Now let’s look at the setting we’ve all come to see. DFS Override Referral Ordering is a property set on a DFS target. This could be a DFS Namespace server or a DFS Folder target.

image

I’ll walk through the effects of each option shown here when set on DFS Target B.

First Among All Targets

The DFS referral process will offer the ordered list:

DFS Target B
DFS Target A
DFS Target Hub
DFS Target C
<random ordering of DFS Target D and DFS Target E>

Last Among All Targets

The DFS referral process will offer the ordered list:

DFS Target A
DFS Target Hub
DFS Target C
<random ordering of DFS Target D and DFS Target E>
DFS Target B

First Among Targets of Equal Cost

The DFS referral process will offer the ordered list:

DFS Target A
DFS Target Hub
DFS Target B
DFS Target C
<random ordering of DFS Target D and DFS Target E>

Last Among Targets of Equal Cost

The DFS referral process will offer the ordered list:

DFS Target A
DFS Target Hub
DFS Target C
DFS Target B
<random ordering of DFS Target D and DFS Target E>

As you can see, these options give predictable results. Now for the cool bit …

INSITE + DFS Override Referral Ordering

As I mentioned earlier, INSITE (or Exclude targets outside of the client’s site) will cause the DFS referral process to offer only:

DFS Target A

If we combine this with DFS Override Referral Ordering set to Last Among all Targets on DFS Target Hub, the DFS referral process will offer:

DFS Target A
DFS Target Hub

In other words, our local target first and the hub target second with no other out of site referrals. This would be desirable in environments where the network is not fully routable with client connectivity limited to the local site and the hub site.

If more than one target has DFS Override Referral Ordering set to the same value, those targets will be returned in random order at the appropriate point in the referral list (i.e. site costing is ignored for those targets with override settings). For example, if DFS Target Hub and DFS Target B were configured with DFS Override Referral Ordering set to Last Among all Targets in combination with INSITE, the DFS referral process will offer:

DFS Target A
<random ordering of DFS Target Hub and DFS Target B>

Lastly, First/Last Among Targets of Equal Cost have no effect when INSITE is set.

Conclusion

DFS Override Referral Ordering has some interesting applications and allows you to steer clients to targets in environments where site-costed referrals are not ideal.

- Mark “Be Glad I’m not a Geneticist” Renoden

Friday Mail Sack: They Pull Me Back in Edition

$
0
0

Hiya world, Ned is back with your best questions and comments. I’ve been off to teach this fall’s MCM, done Win8 stuff, and generally been slacking keeping busy; sorry for the delay in posting. That means a hefty backlog - get ready to slurp.

Today we talk:

I know it was you, Fredo.

Question

If I run netdom query dc only writable DCs are returned. If I instead run nltest /dclist:contoso.com, both writable and RODCs are returned. Is it by design that netdom can't find RODC?

Answer

It’s by design, but not by any specific intentions. Netdom was written for NT 4.0 and uses a very old function when you invoke QUERY DC, which means that if a domain controller is not of type SV_TYPE_DOMAIN_CTRL or SV_TYPE_DOMAIN_BAKCTRL, they are not shown in the list. Effectively, it queries for all the DCs just like Nltest, but it doesn’t know what RODCs are, so it won’t show them to you.

Nltest is old too, but its owners have updated it more consistently. When it returns all the DCs (using what amounts to the same lookup functions), it knows modern information. For instance, when it became a Win2008 tool, its owners updated it to use the DS_DOMAIN_CONTROLLER_INFO_3 structure, which is why it can tell you the FQDN, which servers are RODCs, who the PDCE is, and what sites map to each server.

image

When all this new RODC stuff came about, the developers either forgot about Netdom or more likely, didn’t feel it necessary to update both with redundant capabilities – so they updated Nltest only. Remember that these were formerly out-of-band support tools that were not owned by the Windows team until Vista/2008 – in many cases, the original developers had been gone for more than a decade.

Now that we’ve decided to make PowerShell the first class citizen, I wouldn’t expect any further improvements in these legacy utilities.

Question

We’re trying to use DSRevoke on Win2008 R2 to enumerate access control entries. We are finding it spits out: “Error occurred in finding ACEs.” This seems to have gone belly up in Server 2008. Is this tool in fact deprecated, and if so do you know of a replacement?

Answer

According to the download page, it only works on Win2003 (Win2000 being its original platform, and being dead). It’s not an officially supported tool in any case – just made by some random internal folks. You might say it was deprecated the day it released. :)

I also find that it fails as you said on Win2008 R2, so you are not going crazy. As for why it’s failing on 2008 and 2008 R2, I have not the foggiest idea, and I cannot find any info on who created this tool or if it even still has source code (it is not in the Windows source tree, I checked). I thought at first it might be an artifact of User Account Control, but even on a Win2008 R2 Core server, it is still a spaz.

I don’t know of any purpose-built replacements, although if I want to enumerate access on OUs (or anything), I’d use AD PowerShell and Get-ACL. For example, a human-readable output:

import-module activedirectory

cd ad:

get-acl(get-adobject someDNinquotes) | format-list

image

Or to get all the OUs:

get-acl(get-adorganizationalunit –filter *) | fl

image

Or fancy spreadsheets using select-object and export-csv (note – massaged in Excel, it won’t come out this purty):

image

image

Or whatever. The world is your oyster at that point.

You can also use Dsacls.exe, but it’s not as easy to control the output. And there are the fancy/free Quest AD PowerShell tools, but I can’t speak to them (Get-QADPermission is the cmdlet for this).

Question

We are thinking about removing evil WINS name resolution from our environment. We hear that this has been done successfully in several organizations. Is there anything we need to watch out for in regards to Active Directory infrastructure? Are there any gotchas you've seen with environments in general? Also, it seems that the days of WINS may be numbered. Can you offer any insight into this?

Answer

Nothing “current” in Windows has any reliance on WINS resolution – even the classic components like DFS Namespaces have long ago offered DNS alternatives - but legacy products may still need it. I’m not aware of any list of Microsoft products with all dependencies, but we know Exchange 2003 and 2007 require it, for instance (and 2010 does not). Anything here that requires port 137 Netbios name resolution may fail if it doesn’t also use DNS. Active Directory technologies do not need it; they are all from the DNS era.

A primary limitation of WINS and NetBT is that they do not support IPv6, so anything written for Server 2008 and up wouldn’t have been tested without DNS-only resolution. If you have legacy applications with WINS dependency for specific static records, and they are running at least Server 2008 for DNS, you can replace the single-label resolution functionality provided by WINS with the DNS GlobalNames zone. See http://technet.microsoft.com/en-us/library/cc731744.aspx. Do not disable the TCP/IP NetBIOS Helper service on any computers, even if you get rid of WINS. All heck will break loose.

Rest assured that WINS is still included in the Windows 8 Server Developer Preview, and Microsoft itself still runs many WINS servers; odds are good that you have at least 12 more years of WINS in your future. Yay!

I expect to hear horror stories in the Comments…

Question

What is the expected behavior with respect to any files created in DFSR-replicated folders if they're made prior to initial sync completion? I.e. data in the replicated folder is added or modified on the non-authoritative server during the initial sync?

Answer

  1. If it’s a brand new file created by the user on the downstream, or if the file has already “replicated” from the upstream (meaning that its hash and File ID are now recorded by the downstream server, not that the file actually replicates) and is later changed by the user before initial replication is fully complete, nothing “bad” happens. Once initial sync completes, their original changes and edits will replicate back outbound without issues.
  2. If the user has bad timing and starts modifying existing pre-seeded files that have not yet had their file ID and hashes replicated (which would probably take a really big dataset combined with a really poor network), their files will get conflicted and changes wiped out, in favor of the upstream server.

Question

During initial DFSR replication of a lot of data, I often see debug log messages like:

20111028 17:06:30.308 9092 CRED   105 CreditManager::GetCredits [CREDIT] No update credits available. Suspending Task:00000000010D3850 listSize:1 this:00000000010D3898

 

20111028 17:06:30.308 9092 IINC   281 IInConnectionCreditManager::GetCredits [CREDIT] No connection credits available, queuing request.totalConnectionCreditsGranted:98 totalGlobalCreditsGranted:98 csId:{6A576AEE-561E-8F93-8C99-048D2348D524} csName:GooconnId:{B34747C-4142-478F-96AF-D2121E732B16} sessionTaskPtr:000000000B4D5040

And just what are DFSR “Credits?” Does this amount just control how many files can be replicated to a partner before another request has to be made?  Is it a set amount for a specific amount of time per server?

Answer

Not how many files, per se - how many updates. A credit maps to a "change" - create, modify, delete.  All the Credit Manager code does is allow an upstream server to ration out how many updates each downstream server can request in a batch. Once that pool is used up, the downstream can ask again. It ensures that one server doesn't get to replicate all the time and other servers never replicate - except in Win2003/2008, this still happened. Because we suck. In Win2008 R2, the credit manager now correctly puts you to the back of the queue if you just showed up asking for more credits, and gives other servers a chance. As an update replicates, a credit is "given back" until your list is exhausted. It has nothing to do with time, just work.

"No update credits available" is normal and expected if you are replicating a bung-load of updates. And in initial sync, you are.

Question

The registry changes I made after reading your DFSR tuning article made a world of difference. I do have a question though: is the max number of replicating server only 64?

Answer

Not the overall max, just the max simultaneously. I.e. 64 servers replicating a file at this exact instance in time. We have some customers with more than a thousand replicating servers (thankfully, using pretty static data).

Question

Can members of the Event Log Readers group automatically access all event logs?

Answer

Almost all. To see the security on any particular event log, you can use wevtutil gl . For example:

wevtutil gl security

image

Note the S-1-5-32-573 SID there on the end – that is the Event Log Readers well-known built-in SID. If you wanted to see the security on all your event logs, you could use this in a batch file (wraps):

@echo off

if exist %temp%\eventlistmsft.txt del %temp%\eventlistmsft.txt

if exist %temp%\eventlistmsft2.txt del %temp%\eventlistmsft2.txt

Wevtutil el > %temp%\eventlistmsft.txt

For /f "delims=;" %%i in (%temp%\eventlistmsft.txt) do wevtutil gl "%%i" >> %temp%\eventlistmsft2.txt

notepad %temp%\eventlistmsft2.txt

My own quick look showed that a few do not ACL with that group – Internet Explorer, Microsoft-Windows-CAPI2, Microsoft-Windows-Crypto-RNG, Group Policy, Microsoft-Windows-Firewall with advanced security. IE seems like an accident, but the others were likely just considered sensitive by their developers.

Other stuff

Happy Birthday to Bill Gates and to Windows XP. You’re equally responsible for nearly every reader or writer of this blog having a job. And in my case, one not digging ditches. So thanks, you crazy kids.

The ten best Jeremy Clarkson Top Gear lines… in the world!

Halloween Part 1: Awesome jack-o-lantern templates, courtesy of ThinkGeek. Yes, they have NOTLD!

Halloween Part 2: Dogs in costume, courtesy of Bing. The AskDS favorite, of course, is:

image

 

Thanks to Japan, you can now send your boss the most awesome emoticon ever, when you fix an issue but couldn’t get root cause:

¯\_(ツ)_/¯

Pluto returning to planet status? It better be; that do-over was lame…

Finally – my new favorite place to get Sci-Fi and Fantasy pics is Cgsociety. Check out some of 3D and 2D samples from the Showcase Gallery:

 

clip_image002 clip_image004
clip_image006 clip_image008
clip_image010 clip_image012
clip_image014
That last one makes a great lock screen

Have a great weekend, folks.

- Ned “They hit him with five shots and he's still alive!” Pyle

Friday Mail Sack: Guest Reply Edition

$
0
0

Hi folks, Ned here again. This week we talk:

Let's gang up.

Question

We plan to migrate our Certificate Authority from single-tier online Enterprise Root to two-tier PKI. We have an existing smart card infrastructure. TechNet docs don’t really speak to this scenario in much detail.

1. Does migration to a 2-tier CA structure require any customization?

2. Can I keep the old CA?

3. Can I create a new subordinate CA under the existing CA and take the existing CA offline?

Answer

[Provided by Jonathan Stephens, the Public Keymaster- Editor]

We covered this topic in a blog post, and it should cover many of your questions: http://blogs.technet.com/b/askds/archive/2010/08/23/moving-your-organization-from-a-single-microsoft-ca-to-a-microsoft-recommended-pki.aspx.

Aside from that post, you will also find the following information helpful: http://blogs.technet.com/b/pki/archive/2010/06/19/design-considerations-before-building-a-two-tier-pki-infrastructure.aspx.

To your questions:

  1. While you can migrate an online Enterprise Root CA to an offline Standalone Root CA, that probably isn't the best decision in this case with regard to security. Your current CA has issued all of your smart card logon certificates, which may have been fine when that was all you needed, but it certainly doesn't comply with best practices for a secure PKI. The root CA of any PKI should be long-lived (20 years, for example) and should only issue certificates to subordinate CAs. In a 2-tier hierarchy, the second tier of CAs should have much shorter validity periods (5 years) and is responsible for issuing certificates to end entities. In your case, I'd strong consider setting up a new PKI and migrating your organization over to it. It is more work at the outset, but it is a better decision long term.
  2. You can keep the currently issued certificates working by publishing a final, long-lived CRL from the old CA. This is covered in the first blog post above. This would allow you to slowly migrate your users to smart card logon certificates issued by the new PKI as the old certificates expired. You would also need to continue to publish the old root CA certificate in the AD and in the Enterprise NTAuth store. You can see these stores using the Enterprise PKI snap-in: right-click on Enterprise PKI and select Manage AD Containers. The old root CA certificate should be listed in the NTAuthCertificates tab, and in the Certificate Authorities Container tab. Uninstalling the old CA will remove these certificates; you'll need to add them back.
  3. You can't take an Enterprise CA offline. An Enterprise CA requires access to Active Directory in order to function. You can migrate an Enterprise CA to a Standalone CA and take that offline, but, as I've said before, that really isn't the best option in this case.

Question

Are there any know issues with P2Ving ADAM/AD LDS servers?

Answer

[Provided by Kim Nichols, our resident ADLDS guru'ette - Editor]

No problems as far as we know. The same rules apply as P2V’ing DCs or other roles; make sure you clean up old drivers and decommission the physicals as soon as you are reasonably confident the virtual is working. Never let them run simultaneously. All the “I should have had a V-8” stuff.

Considering how simple it is to create an ADLDS replica, it might be faster and "cleaner" to create a new virtual machine, install and replicate ADLDS to it, then rename the guest and throw away the old physical; if ADLDS was its only role, naturally.

Question

[Provided by Fabian Müller, schlau Deutsche PFE- Editor]

When using production delegation in AGPM, we can grant permissions for editing group policy objects in the production environment. But these permissions will be written to all deployed GPOs, not for specific ones. GPMC makes it easy to set “READ” and “APPLY” permissions on a GPO, but I cannot find a security filtering switch in AGPM. So how can we manage the security filtering on group policies without setting the same ACL on all deployed policies?

Answer

Ok, granting “READ” and “APPLY” permissions respectively managing security filtering in AGPM is not that obvious to find. Do it like this in the change control panel of AGPM:

  • Check-out the according Group Policy Object and provide a brief overview of the changes to be done in the “comments” window, e.g. “Add important security filtering ACLs for group XYZ, dude!
  • Edit the checked-out GPO

In the top of the Group Policy Management Editor, click “Action” –> “Properties”:

image

  • Change to “Security” tab and provide your settings for security filtering:

image

  • Close the Group Policy Management Editor and Check-in the policy (again with a good comment)
  • If everything is done you can now safely “Deploy” the just edited GPO – now the security filter is in place in production:

image

Note 1: Be aware that you won’t find any information regarding the security filtering change in the AGPM history of the edited group policy object. There is nothing in the HTML reports that refer to security filtering changes. That’s why you should provide a good explanation on your changes during “check-in” and “check-out” phase:

image

image

Note 2: Be careful with “DENY” ACEs using AGPM – they might get removed. See the following blog for more information on that topic: http://blogs.technet.com/b/grouppolicy/archive/2008/10/27/agpm-3-0-doesnt-preserve-all-the-deny-aces.aspx

Question

I have one Windows Server 2003 IIS machine with two web applications, each in its own application pool. How can I register SPNs for each application?

Answer

[This one courtesy of Rob Greene, the Abominable Authman - Editor]

There are a couple of options for you here.

  1. You could address each web site on the same server with different host names.  Then you can add the specific HTTP SPN to each application pool account as needed.
  2. You could address each web site with a unique port assignment on the web server.  Then you can add the specific HTTP SPN with the port attached like http/myweb.contoso.com:88
  3. You could use the same account to run all the application pool accounts on the same web server.

NOTE: If you choose option 1 or 2, you have to be careful about Internet Explorer behaviors.  If you choose the unique host name per web site then you will need to make sure to use HOST records in DNS or put a registry key in place on all workstations if you choose CNAME.  If you choose having a unique port for each web site, you will need to put a registry key in place on all workstations so that they send the port number in the TGS SPN request.

http://blogs.technet.com/b/askds/archive/2009/06/22/internet-explorer-behaviors-with-kerberos-authentication.aspx

Question

Comparing AGPM controlled GPOs within the same domain is no problem at all – but if the AGPM server serves more than one domain, how can I compare GPOs that are hosted in different domains using AGPM difference report?

Answer

[Again from Fabian, who was really on a roll last week - Editor]

Since AGPM 4.0 we provide the ability to export and import Group Policy Objects using AGPM. What you have to do is:

  • To export one of the GPOs from domain 1…:

image

  • … and import the *.cab to domain 2 using the AGPM GPO import wizard (right-click on an empty area in AGPM Contents—> Controlled tab and select “New Controlled GPO…”):

image

image

  • Now you can simply compare those objects using difference report:

image

[Woo, finally some from Ned - Editor]

Question

When I use the Windows 7 (RSAT) version of AD Users and Computers to connect to certain domains, I get error "unknown user name or bad password". However, when I use the XP/2003 adminpak version, no errors for the same domain. There's no way to enter a domain or password.

Answer

ADUC in Vista/2008/7/R2 does some group membership and privilege checking when it starts that the older ADUC never did. You’ll get the logon failure message for any domain you are not a domain admin in, for example. The legacy ADUC is probably broken for that account as well – it’s just not telling you.

image

Question

I have 2 servers replicating with DFSR, and the network cable between them is disconnected. I delete a file on Server1, while the equivalent file on Server2 is modified. When the cable is re-connected, what is the expected behavior?

Answer

Last updater wins, even if a modification of an ostensibly deleted file. If the file was deleted first on server 1 and modified later on server 2, it would replicate back to server 1 with the modifications once the network reconnected. If it had been deleted later than the modification, that “last write” would win and it would delete from the other server once the network resumed.

More info on DFSR conflict handling here http://blogs.technet.com/b/askds/archive/2010/01/05/understanding-dfsr-conflict-algorithms-and-doing-something-about-conflicts.aspx

Question

Is there any automatic way to delete stale user or computer accounts? Something you turn on in AD?

Answer

Nope, not automatically; you have to create a solution to detect the age and disable or delete stale accounts. This is a very dangerous operation - make sure you understand what you are getting yourself into. For example:

Question

Whenever I try to use the PowerShell cmdlet Get-ACL against an object in AD, always get an error like " Cannot find path ou=xxx,dc=xxx,dc=xxx because it does not exist". But it does!

Answer

After you import the ActiveDirectory module, but before you run your commands, run:

CD AD:

Get-Acl won’t work until you change to the magical “active directory drive”.

Question

I've read the Performance Tuning Guidelines for Windows Server, and I wonder if all SMB server tuning parameters (AsyncCredits, MinCredits, MaxCredits, etc) also work (or help) for DFSR.  Also, do you know the limit is for SMB Asynchronous Credits - the document doesn’t say?

Answer

Nope, they won’t have any effect on DFSR – it does not use SMB to replicate files. SMB is only used by the DFSMGMT.MSC if you ask it to create a replicated folder on another server during RF setup. More info here:

Configuring DFSR to a Static Port - The rest of the story - http://blogs.technet.com/b/askds/archive/2009/07/16/configuring-dfsr-to-a-static-port-the-rest-of-the-story.aspx

That AsynchronousCredits SMB value does not have a true maximum, other than the fact that it is a DWORD and cannot exceed 4,294,967,295 (i.e. 0xffffffff). Its default value on Windows Server 2008 and 2008 R2 is 512; on Vista/7, it's 64.

HOWEVER!

As KB938475 (http://support.microsoft.com/kb/938475) points out, adjusting these defaults comes at the cost of paged pool (Kernel) memory. If you were to increase these values too high, you would eventually run out of paged pool and then perhaps hang or crash your file servers. So don't go crazy here.

There is no "right" value to set - it depends on your installed memory, if you are using 32-bit versus 64-bit (if 32-bit, I would not touch this value at all), the number of clients you have connecting, their usage patterns, etc. I recommend increasing this in small doses and testing the performance - for example, doubling it to 1024 would be a fairly prudent test to start.

Other Stuff

Happy Birthday to all US Marines out there, past and present. I hope you're using Veterans Day to sleep off the hangover. I always assumed that's why they made it November 11th, not that whole WW1 thing.

Also, happy anniversary to Jonathan, who has been a Microsoft employee for 15 years. In keeping with the tradition, he had 15 pounds of M&Ms for the floor, which in case you’re wondering, it fills a salad bowl. Which around here, means:

image

Two of the most awesome things ever – combined:

A great baseball story about Lou Gehrig, Kurt Russell, and a historic bat.

Off to play some Battlefield 3. No wait, Skyrim. Ah crap, I mean Call of Duty MW3. And I need to hurry up as Arkham City is coming. It's a good time to be a PC gamer. Or Xbox, if you're into that sorta thing.

 

Have a nice weekend folks,

 - Ned "and Jonathan and Kim and Fabian and Rob" Pyle

Removing DFSR Filters

$
0
0

Hi folks, Ned here again. DFSR administrators usually know about its built-in filtering mechanism. You can configure each filter based on file name and extension; by default, files named like “*.bak, *.tmp, ~*” are filtered, as they are unlikely to be permanent or useful between servers. You can also filter out folders; this is less common, as you cannot provide a full path - only a name. Sometimes though, it is useful for a specific working folder used by applications.

When you enable a filter, each server scans its database for that replicated folder path and removes any records of files and folders that match the filter. Because the database doesn’t care about filtered objects, DFSR ignores any future changes to the files. In practical filtering terms, this means:

1. Any new files/folders added do not replicate to any other server.

2. Files/folders that already exist through previous replication stay on all servers.

3. Files/folders previously replicated and then later modified after enabling the filter are deleted from all other servers. After all, they are filtered and therefore “no longer exist” when updated, according to the downstream servers.

But what about when you remove filters?

image

This is rare enough that we've never bothered to document it. There are key issues to understand:

1. You must install the latest DFSR hotfixes for your operating system, on all DFSR servers.

List of currently available hotfixes for Distributed File System (DFS) technologies in Windows Server 2008 and in Windows Server 2008 R2 - http://support.microsoft.com/kb/968429

List of currently available hotfixes for Distributed File System (DFS) technologies in Windows Server 2003 and in Windows Server 2003 R2 - http://support.microsoft.com/kb/958802

Do not continue with any filter changes until installing those hotfixes and restarting the DFSR servers*. There is a very nasty bug that leads to folders refusing to replicate their previously filtered-contents or files that disappear from partner servers.

* Alternatively, you can stop the DFSR service and install the hotfixes. Generally, the removes the need to restart and no prompt displays.

Note:

I'm often asked if DFSR hotfixes are recommended preemptively and the answer is YES. Data loss hotfixes do not fix your lost data, only make further data loss stop! You should probably keep on top of NTFS hotfixes too.

2. Any changes made between servers after enabling the filter are going to result in some deleted or overwritten files, and that is by design. By setting a DFSR filter, you told DFSR that those folders and their contents no longer existed in DFSR terms. When the filter comes off and after making the first change in some server’s copy of that folder, the conflict resolution algorithm is going to kick in and the two folders synchronize, regardless of your wishes as to which folder will win.

Here I filtered subfolder2 and created – on each server – different bmp files named madeon1 (on server 01) and madeon2 (on server 02):

image

image

Then I removed the filter, forced AD replication, and forced DFSR to poll AD. Those files are different, so nothing “bad” happened and they sync:

image

However, if I repeat that un-filtering test with two files with the same name, but different contents (the file was originally replicated to both server, filtered out, and then later modified on server 02):

image

image

Then the newer file will win, overwriting what’s on 01 (even if it was “good” data for some user). This is why when you filter folders from replication, you should make sure it’s not user data that is still modified on multiple servers. Someday it may have to re-converge.

image

3. It is critical that you back up filtered folders on all servers replicating its parent, as you are very likely to need to restore some files in order to undo some of the conflict resolution damage. Some users are going to be very unhappy otherwise – and if they are the Vice President of TV Programming and Microwave Oven Technologies, they will come down on you like a load of bricks.

A minority of customers use DFSR folder filters – they are not very flexible, due to their lack of path support. They are safe only if users cannot alter them or their contents – then at least they will not have a negative experience.

Until next time,

- Ned “Director of Directories” Pyle

More than you ever wanted to know about Remote Desktop Licensing

$
0
0

Hey everyone, David here. Here in support, there are certain types of calls that we love to get – because they’re cool or interesting, and when we figure them out, we feel like we’re making the world a better place. These are the calls that prompt us to write long-winded blog posts with lots of pretty pictures because we’re proud of ourselves for figuring out the issue.

Sadly, calls about Remote Desktop Licensing (formerly known as Terminal Services Licensing) aren’t those kinds of calls. Instead, they’re often questions about things that we really should have written down on the Internet a long time ago, so that people wouldn’t have to call us. Things like “How do I migrate my license server to a new OS version?” and “How does licensing server discovery really work?” That’s not to say that we don’t still get some interesting RDS Licensing calls, but most of them are run-of-the-mill. And to tell you the truth, we don’t want you to have to call us for stuff that you could have figured out if only someone had bothered to document it for you.

So, we did something that probably should have happened years ago: we went around the team and collected every scrap of knowledge we could find about RDS Licensing. We then scrubbed it (some of it was very dusty), made sure it was accurate, and, using liberal amounts of leftover Halloween candy from the bowl on Ned’s desk, bribed the team of writers that manages TechNet to make it freely available to everyone in one easy place.

On November 11th, we published Troubleshooting Remote Desktop Licensing Issues on TechNet.

Click that link, and you can find all sorts of information about things like:

  • The different types of CALs and how they work
  • License Server Discovery and how it works
  • How the Grace Period really works
  • Installing or Migrating CALs
  • Lots more useful stuff

We hope that someday it saves you a support call. And if there’s something RDS-related that you don’t see there, tell us about it in a comment. (Ned still has more Halloween candy for those writers). Enjoy!

WP_000599 (480x640)
No really… he does.

- David “Hydra” Beach

Friday Mail Sack: Dang, This Year Went Fast Edition

$
0
0

Hi folks, Ned here again with your questions and comments. This week we talk:

On Dasher! On Comet! On Vixen! On --- wait, why does the Royal Navy name everything after magic reindeer? You weirdoes. 

Question

I am planning to increase my forest Tombstone Lifetime and I want to make sure there are no lingering object issues created by this operation. I am using doGarbageCollection to trigger garbage collection immediately, but finding with an increased Garbage Collection logging level that this does not reset the 12-hour schedule, so collection runs again sooner than I hoped. Is this expected?

Answer

Yes. The rules for garbage collection are:

  1. Runs 15 minutes after the DC boots up (15 minutes after the NTDS service starts, in Win2008 or later)
  2. Runs every 12 hours (by default) after that first time in #1
  3. Runs on the interval set in attribute garbageCollPeriod if you want to override the default 12 hours (minimum supported is 1 hour, no less)
  4. Runs when forced with doGarbageCollection

Manually running collection does not alter the schedule or “reset the timer”; only the boot/service start changes that, and only garbageCollPeriod alters the next time it will run automagically.

Therefore, if you wanted to control when it runs on all DCs and get them roughly “in sync”, restarting all the DCs or their NTDS services would do it. Just don’t do that to all DCs at precisely the same time or no one will be able to logon, mmmmkaaay?

Question

I’ve read your post on filtering group policy using WMI. The piece about Core versus Full was quite useful. Is there a way to filter based on installed roles and features though?

Answer

Yes, but only on Windows Server 2008 and later server SKUs, which supports a class named Win32_ServerFeature. This class returns an array of ID properties that populates only after installing roles and features. Since this is WMI, you can use the WMIC.EXE to see this before monkeying with the group policy:

image

So if you wanted to use the WQL filtering of group policy to apply a policy only to Win2008 FAX servers, for example:

image

On a server missing the FAX Server role, the policy does not apply:

image
If you still care about FAXes though, you have bigger issues. 

Question

We’re having issues with binding Macs (OS 10.6.8 and 10.7) to our AD domain that uses a '.LOCAL’ suffix.  Apple is suggesting we create Ipv6 AAAA and PTR records for all our DCs. Is this the only solution and could it cause issues?

Answer

That’s not the first time Apple has had issues with .local domains and may not be your only problem (here, here, here, etc.). Moreover, it’s not only Apple’s issue: .local is a pseudo top-level domain suffix used by multicast DNS. As our friend Mark Paris points out, it can lead to other aches and pains. There is no good reason to use .local and the MS recommendation is to register your top level domains then create roots based off children of that: for example, Microsoft’s AD forest root domain is corp.microsoft.com, then uses geography to denote other domains, like redmond.corp.microsoft.com and emea.corp.microsoft.com; geography usually doesn’t change faster than networks. The real problem was timing: AD was in development several years before the .local RFC released. Then mDNS variations had little usage in the next decade, compared to standard DNS. AD itself doesn’t care what you do as long as you use valid DNS syntax. Heck, we even used it automatically when creating Small Business Server domains.

Enough rambling. There should be no problem adding unused, internal network Ipv6 addresses to DNS; Win2008 and later already have IPv6 ISATAP auto-assigned addresses that they are not using either. If that’s what fixes these Apple machines, that’s what you must do. You should also add matching IPv6 network “subnets” to all your AD sites as well, just to be safe.

Although if it were me, I’d push back on Apple to fix their real issue and work with this domain, as they have done previously. This is a client problem on their end that they need to handle – these domains predate them by more than a decade. All they have to do is examine the SOA record and it will be clear that this is an internal domain, then use normal DNS in that scenario.

Oh, or you could rename your forest.

BBWWWWAAAAAAAHAHAHAHAHHAHAHAHHAHAHAHHAHAHAHAHAAA.

Sorry, had to do it. ツ

Question

We were reviewing your previous site coverage blog post. If I use this registry sitecoverage item on DCs in the two different sites to cover a DC-less site, will I get some form of load balancing from clients in that site? I expect that all servers with this value set will create SRV records in DNS to cover the site, and that DNS will simply follow normal round-robin load balancing when responding to client requests. Is this correct?

Answer

[From Sean Ivey, who continues to rock even after he traitorously left us for PFE – Ned]

From a client perspective, all that matters is the response they get from DC/DNS from invoking DCLocator.  So for clients in that site, I don’t care how it happens, but if DCs from other sites have DNS records registered for the DC-less site, then typical DNS round robin will happen (assuming you haven’t disabled that on the DNS server).

For me, the question is…”How do I get DCs from other sites to register DNS records for the DC-less site ?” review this:

http://technet.microsoft.com/en-us/library/cc937924.aspx

I’m partial to using group policy though.  I think it’s a cleaner solution.  You can find the GP setting that does the same thing here:

clip_image001

Simply enable the setting, enter the desired site, and make sure that it only applies to the DC’s you want it to apply to (you can do this with security filtering). 

Anyway, so I set this up in my lab just to confirm everything works as expected. 

My sites:

clip_image002

Notice TestCoverage has no DC’s.

My site links:

clip_image002[5]

Corp-HQ is my hub so auto site coverage should determine the DC’s in Corp-HQ are closest and should therefore cover site TestCoverage.

DNS:

clip_image002[7]

Whaddya know, Infra-DC1 is covering site TestCoverage as expected.

Next I enable the GPO I pointed out and apply it only to Infra-DC2 and voila!  Infra-DC2 (which is in the Corp-NA site) is now also covering the TestCoverage site:

clip_image002[9]

You have a slightly more complicated scenario because auto site coverage has to go one step farther (using the alphabet to decide who wins) but in the end, the result is the same. 

Question

We’re seeing very high CPU usage in DFSR and comparably poor performance. These are brand new servers - just unboxed from the factory - with excellent modern hardware. Are there any known issues that could cause this?

Answer

[Not mine, but instead paraphrased from an internal conversation with MS hardware experts; this resolved the issue – Ned]

Set the hardware C-State to maximize performance and not save power/lower noise. You must do this through the BIOS menu; it’s not a Microsoft software setting. We’ve also seen this issue with SQL and other I/O-intensive applications running on servers.

Question

Can NetApp devices host DFS Namespace folder targets?

Answer

This NetApp community article suggests that it works. Microsoft has no way to validate if this is true or not but sounds ok. In general, any OS that can present a Windows SMB/CIFS share should work, but it’s good to ask.

Question

How much disk performance reduction should we expect with DFSR, DFSN, FRS, Directory Services database, and other Active Directory “stuff” on Hyper-V servers, compared to physical machines?

Answer

We published a Virtual Hard Disk Performance whitepaper without much fanfare last year. While it does not go into specific details around any of those AD technologies, it provides tons of useful data for other enterprise systems like Exchange and SQL. Those apps are very “worst case” case as they tend to write much more than any of ours. It also thoroughly examines pure file IO performance, which makes for easy comparison with components like DFSR and FRS. It shows the metrics for physical disks, fixed VHD, dynamic VHD, and differencing VHD, plus it compares physical versus virtual loads (spoiler alert: physical is faster, but not as much as you might guess).

It’s an interesting read and not too long; I highly recommend it.  

Other Stuff

Joseph Conway (in black) was nearly beaten in his last marathon by a Pekinese:

clip_image00211
Looks ‘shopped, I’m pretty sure the dog had him

Weirdest Thanksgiving greeting I received last month? “Have a great Turkey experience.”

Autumn is over and Fail Blog is there (video SFW, site is often… not):

A couple excellent “lost” interviews from Star Wars. Mark Hamill before the UK release of the first film and the much of the cast just after Empire.  

New York City has outdone its hipster’itude again, with some new signage designed to prevent you from horrible mangling. For example:

image
Ewww?

IO9 has their annual Christmas mega super future gift guide out and there are some especially awesome suggestions this year. Some of my favorites:

Make also has a great DIY gift guide. Woo, mozzarella cheese kit!

Still can’t find the right gift for the girls in your life? I recommend Zombie Attack Barbie.

On a related topic, Microsoft has an internal distribution alias for these types of contingencies:

image
“A group whose goal is to formulate best practices in order to ensure the safety of Microsoft employees, physical assets, and IP in the event of a Zombie Apocalypse.” 

Finally

This is the last mail sack before 2012, as I am a lazy swine going on extended vacation December 16th. Mark and Joji have some posts in the pipeline to keep you occupied. Next year is going to be HUGE for AskDS, as Windows 8 info should start flooding out and we have all sorts of awesome plans. Stay tuned.

Merry Christmas and happy New Year to you all.

- Ned “oink” Pyle


Effective Troubleshooting

$
0
0

Hi everyone. It’s Mark Renoden here again and today I’ll talk about effective troubleshooting. As I visit various customers, I’m frequently asked how to troubleshoot a certain problem, or how to troubleshoot a specific technology. The interesting thing for me is that these are really questions within a question – how do you effectively troubleshoot?

Before I joined Premier Field Engineering, I’d advanced through the ranks of Commercial Technical Support (CTS). Early on, my ability to help customers relied entirely on having seen the issue before or my knowledge base search skills. Over time, I got more familiar with the technologies and could feel my way through an issue. These days I’m more consciously competent and have a much better understanding of how to work on an issue – the specifics of the problem are less important. The realisation is that troubleshooting is a skill and it’s a skill more general than one technology, platform or industry.

I’d like to draw your attention to an excellent book on the topic –

Debugging by David J. Agans
Publisher: Amacom (September 12, 2006)
ISBN-10: 0814474578
ISBN-13: 978-0814474570

In his book, Agans discusses what he refers to as “… the 9 indispensable rules …” for isolating problems. I’ll be referring to these rules in the context of being an IT Professional.

Understand the System – Debugging, Chapter 3, pg 11

In order to isolate a problem, Agans discusses the need to understand the system you’re working with. Consider the following.

Purpose – What is the system designed to do and does this match your expectation? It’s surprising how often an issue has its roots in misunderstanding the capabilities of a technology.

Configuration – How was the system deployed and does that match intentions? Do you have a test environment? If you have a test environment, you can compare “good” with “bad” or even reproduce the issue and have a safe place to experiment with solutions.

Interdependencies – This is an important thing to understand. Take the example of DFSR – where there are dependencies on network connectivity/ports, name resolution, the file system and Active Directory. Problems with these other components could surface as symptoms in DFSR. Understanding the interplay between these “blocks” and what each “block” is responsible for will greatly assist you in isolating problems.

Tools – It could be argued that tools aren’t part of the system but without knowing how to interrogate each component, you’re unlikely to get very far. Log files, event logs, command line utilities and management UIs all tell you something about configuration and behaviour. Further to this, you need to know how to read and interpret the output. Your tools might include log processing scripts or even something as obscure as an Excel pivot table.

If you don't know how the system works, look it up. Seek out every piece of documentation you can find and read it. Build a test environment and experiment with configuration. Understand what “normal” looks like.

Check the Plug – Debugging, Chapter 9, pg 107

Start at the beginning and question your assumptions. Don't rule out the obvious and instead, check the basics. More than a few issues have dragged on too long after overlooking something simple in the early stages of investigation. Can servers ping each other? Does name resolution work? Does the disk have free space?

Do your tools do what you think they do? If you have doubts, it’s time to review your understanding of the system.

Are you misinterpreting data? Try not to jump to conclusions and try to verify your results with another tool. If you hear yourself saying, “I think this data is telling me …” find a way to test your theory.

Divide and Conquer – Debugging, Chapter 6, pg 67

Rather than trying to look at everything in detail, narrow the scope. Divide the system into pieces and verify the behaviour in each area before you get too deep.

  • Does the problem occur for everybody or just a few users?
  • Is every client PC affected or those in just one site?
  • What’s common when the problem occurs?
  • What’s different when the problem is absent?

When you’ve isolated the problem to a specific component or set of components, your knowledge of the system and the tools you can use to gather detail come into play.

Given a known input, what’s the expected output for each dependent component?

A great suggestion discussed by Agans is to start with the symptoms and work back towards the problem. Each time you fail to identify the issue, rule out the working component. This approach is highly useful when there are multiple problems contributing to the symptoms. Address each one as you find it and test for success.

Make it Fail – Debugging, Chapter 4, pg 25

Understanding the conditions that reproduce the problem is an essential step in troubleshooting. When you can reliably reproduce the symptoms, you can concisely log the failure or focus your analysis to a specific window in time. A network capture that begins immediately before you trigger a failure and ends immediately after is a great deal easier to work with than one containing a million frames of network activity in which perhaps twenty are useful to your diagnosis.

Another essential concept covered by Agans is that being able to reproduce an issue on demand provides a sure fire test to confirm a resolution, and that this is difficult if the problem is intermittent. Intermittent problems are just problems that aren’t well understood. If they only occur sometimes, you don’t understand all of the conditions that make them occur. Gather as many logs as you can, compare failures with successes and look for trends.

Quit Thinking and Look – Debugging, Chapter 5, pg 45

Perception and experience is not root cause – it only guides your investigation. It’s essential that you look for information and evidence. As an example, I recently worked on a DFSR issue in which huge backlog was being generated. After talking with the customer, we had our suspicions about root cause but as it turned out, a thorough investigation that combined the use of DFSR debug logs and Process Monitor revealed there were two root causes, both of which were nothing to do with our original ideas.

Only make changes when the change is simpler than collecting evidence, the change won’t cause any damage and when the change is reversible.

Consider data gathering points in the system and which tools or instrumentation expose behaviour but also take care that using tools or turning on instrumentation doesn’t alter the system behaviour. Time sensitive issues are an example where monitoring may hide the symptoms.

Don’t jump to conclusions. Prove your theories.

Change One Thing at a Time – Debugging, Chapter 7, pg 83

Earlier I suggested having a test environment so you could compare “good” with “bad”. Such an environment also allows you to narrow your options for change and to understand possible causes for a problem.

Whether you’re able to refine your list of possibilities or not, it’s important to be systematic when making changes in the system. Make one change at a time and review the behaviour. If the change has no effect, reverse it before moving on.

Another consideration is whether the system ever worked as expected. You may be able to use change records to identify a root cause if you have a vague idea of when the system was last working.

Keep an Audit Trail – Debugging, Chapter 8, pg 97

Don’t rely on your memory. You’re busy – you’ll forget. Keep track of what you’ve done, in which order and how it affected the system. Detail is important and especially so when you’re handing the issue over to a colleague. During my time in CTS, we’d pass cases between each other all the time and sometimes without a face to face handover. Good, detailed case notes were important to a smooth transition.

Get a Fresh View – Debugging, Chapter 10, pg 115

Talk the problem through with a colleague. I’ve had many experiences where I’ve realised how to tackle a problem by just talking about it with another engineer. The act of explaining the facts and clarifying the problem so that someone else could understand it gave me the insight needed to take the next step.

Don’t cloud their view with your own interpretation of the symptoms. Explain the facts and give your colleague a chance to make their own conclusions.

Don’t be embarrassed or too proud to ask for help. Be eager to learn from others – the experience of others is a great learning tool.

If You Didn’t Fix It, It Ain’t Fixed – Debugging, Chapter 11, pg 125

Check that it’s really fixed and try to “make it fail” after you’ve deployed your solution. Reverse the fix and check that it’s broken again. Problems in IT don’t resolve themselves – if symptoms cease and you don’t know why, you’ve missed key details.

- Mark “cut it out with a scalpel” Renoden

Slow-Link with Windows 7 and DFS Namespaces

$
0
0

Hey there, Gary here, and it has been a while since there has been a blog post relating to Offline Files and Slow-link. A couple of years ago I wrote a one in relation to Vista and the slow-link mode policy that still applies to Windows 7; here is the link:

“Configure slow-link mode” policy on Vista for Offline Files

As a quick refresher for those that do not want to read it right now, some takeaways from that post:

  • TCP/SMB stacks - and not the Network Location Awareness (NLA) service - measure link speed/latency
  • Help text still confusing but still do not include the quotes when entering the share and latency/throughput settings. We have gotten used to this by now and don’t see as many misconfigurations anymore
  • A slow-link policy had to be defined to allow a cached location offline due to link speed
  • The matching algorithm for path comparison and support for wildcards is still the same, with the same effect for DFS Namespaces (refer to blog post for explanation)

Ok, so what has changed that is worth writing about?

Default Slow-link defined…again

Windows XP has a default slow link threshold of 64Kbps that was nice and easy to measure... NOT! Windows Vista went for an opt-in approach with the definition of the slow-link policy. Now, Windows 7 brings back the default threshold idea. This time it is a default latency of 80ms. This means that any location the cache is aware of and has cached data can be taken offline when the network latency is measured above 80ms, without having to set a slow-link policy. Of course, if you desire different settings, you can still define your own policy. The default threshold amounts to the following type of policy definition:

image

When a location does transition offline due to slow link, the “Applications and Services Logs\Microsoft\Windows\OfflineFiles\Operational” event log records the following event and lists the latency measured when it went offline:

<==================================================================>

Log Name: Microsoft-Windows-OfflineFiles/Operational

Source: Microsoft-Windows-OfflineFiles

Date: 11/27/2011 9:05:47 AM

Event ID: 1004

Task Category: None

Level: Information

Keywords: Online/offline transitions

User: CONTOSO\offlineuser

Computer: Win7Client.contoso.com

Description:

Path \\server\shared$ transitioned to slow link with latency = 120 and bandwidth = 155272

<==================================================================>

Auto Transition back to Online State

Windows 7 also returned the ability for a cached network location to transition from an offline state due to slow-link back into an online state when network conditions improve, without additional intervention. If you ever dealt with Windows Vista and defined the policy, you know that you had to click the “Work Online” button or use a script to transition the location back into an online state.

After transition, client-side caching performs this check every 2 minutes. Windows Vista also checked, but only for locations offline due to network disconnection, interruption and the like. This does not necessarily mean that right when the check completes that it will transition back online. A slow-link transition is kept in either offline state for a default minimum of 5 minutes.

When the location transitions back into an online state thanks to improved network conditions, the following message records in the “OfflineFiles\Operational” event log:

<==================================================================>

Log Name: Microsoft-Windows-OfflineFiles/Operational

Source: Microsoft-Windows-OfflineFiles

Date: 11/27/2011 9:11:38 AM

Event ID: 1005

Task Category: None

Level: Information

Keywords: Online/offline transitions

User: CONTOSO\offlineuser

Computer: Win7Client.contoso.com

Description:

Path \\server\shared$ transitioned to online with latency = 11

<==================================================================>

That is all great but how does that affect DFSN?

That is a good question! It is actually part of what lead to this blog post and the need for the additional information above. We have been seeing an increased number of calls where part of a DFS Namespace is not accessible when it goes offline. Let us examine briefly a simple namespace that hosts at least one DFS folder. The namespace will include the folders as described below:

image

  • I defined a Folder Redirection policy to redirect the user's My Documents folder to the user folder under \\contoso.com\DFSRoot\userdata. This by default will make the user’s My Documents folder available offline and information in the cache. The user’s home drive is also mapped in this location as drive H: as well.
  • The Shared DFS folder has a drive mapped as drive S: or some folder underneath it.
  • The user is at a remote branch with limited WAN link back and wants to access a document he was working on under the Shared folder. However, the user is unable to access that Shared folder through either the UNC path or a mapped drive he happens to have to that location. He is seeing this error message:

 

image

 

Or

image

However, he is still able to browse and access his Documents (mapped to drive H:) just fine in the same DFS namespace. That would be expected when on or off the network since it is available in the cache by the Folder Redirection policy.

Looking at a file in the user’s home directory it might show that it is “Offline slow link”, “Offline (disconnected)”, or “Offline (working offline)”. In addition, there could be messages in the “OfflineFiles\Operational” event log for ID 9 and/or 1004 referencing the DFS Root. These events represent that the root is offline owing to network disconnection, manually, or due to link speed.

Another quick way to see if the namespace is offline is by doing a quick “DIR \\contoso.com\dfsroot”:

image

How can the namespace go offline, when nothing is available offline from there?

When talking about DFS, Offline Files breaks it down into parts that equate to the namespace and DFS folders. Each evaluates independently from each other. Therefore, the namespace can be offline while the user’s folder can be online or even vice versa. The cache still needs to know about some information about the namespace because that is still part of the path that is available offline. Which means the default latency can apply to it, even though nothing is available there.

In the example above, the S: drive was mapped to a DFS folder that didn’t have anything made available offline from under it, but since the namespace was offline any DFS referrals are not evaluated and no traffic is leaving the box for that path. Thus, I received the error message.

Can we get the top area of the namespace to stay online?

As long as the DFS namespace had transitioned to slow-link mode, you can counteract the default latency by specifying an additional policy setting. You might consider something like the following to keep it online while allowing the userdata DFS folder content to go offline at the specified latency (Remember the blog mentioned at the start of this for more information on the pattern matching):

Value Name Value

*

-or-

\\contoso.com\dfsroot

Latency=32000

(overrides default 80ms latency for all locations)

-or-

Latency=32000

(this allows the default 80Ms to apply to other locations)

\\contoso\dfsroot\userdata

Latency=60

* Allows the userdata link to go offline while other links that are not cached also stay online

 

 

 

 

 

 

 

 

 

 

 Summary

The long and short of this ends up being a tale of default latency applying to where you may not think it should. That behavior can be overridden by defining a counter slow-link policy to set the bar high enough to not take if offline when truly not desired.

Gary "high latency" Mudgett

Winter Break

$
0
0

Hiya folks, Ned here. It's that time of year again, where the AskDS team goes on hiatus to play Call of Duty and Skyrim spend time with family and friends. Please save your emails and questions until the second week of January. No one can hear your screams.

If you’re still scrambling for last minute Christmas shopping ideas, I recommend the IO9 and ArsTechnica gift guides. Much more importantly, if still wishing to make a difference for an underprivileged child, I recommend Toys for Tots.

From everyone here at AskDS, we wish you and your kinfolk a very Merry Christmas and a happy New Year.

image
Make sure you leave the flue open

See you in 2012, everyone.

- Ned Pyle

Understanding the AD FS 2.0 Proxy

$
0
0

Hi guys, Joji Oshima here again. I have had several cases involving the AD FS 2.0 Proxy and there is some confusion on what it is, why you should use it, and how it works. If you are looking for basic information on AD FS, I would check out the AD FS 2.0 Content Map. The goal of this post is to go over the purpose of the AD FS 2.0 Proxy, why you would want to use it, and how it fits with the other components.

What is the AD FS 2.0 Proxy?

The AD FS 2.0 Proxy is a service that brokers a connection between external users and your internal AD FS 2.0 server. It acts as a reverse proxy and typically resides in your organization’s perimeter network (aka DMZ). As far as the user is concerned, they do not know they are talking to an AD FS proxy server, as the federation services are accessed by the same URLs. The proxy server handles three primary functions.

  • Assertion provider: The proxy accepts token requests from users and passes the information over SSL (default port 443) to the internal AD FS server. It receives the token from the internal AD FS server and passes it back to the user.
  • Assertion consumer: The proxy accepts tokens from users and passes them over SSL (default port 443) to the internal AD FS server for processing.
  • Metadata provider: The proxy will also respond to requests for Federation Metadata.

Why use an AD FS 2.0 Proxy?

The AD FS 2.0 Proxy is not a requirement for using AD FS; it is an additional feature. The reason you would install an AD FS 2.0 Proxy is you do not want to expose the actual AD FS 2.0 server to the Internet. AD FS 2.0 servers are domain joined resources, while the AD FS 2.0 Proxy does not have that requirement. If all your users and applications are internal to your network, you do not need to use an AD FS 2.0 Proxy. If there is a requirement to expose your federation service to the Internet, it is a best practice to use an AD FS 2.0 Proxy.

How does the AD FS 2.0 Proxy Work?

The claims-based authentication model expects the user to have direct access to the application server and the federation server(s). If you have an application (or web service) that is Internet facing, and the AD FS server is on the internal network, this can cause an issue. A user on the Internet can contact the application (or web service), but when the application redirects the user to the AD FS server, it will not be able to connect to the internal AD FS server. Similarly, with IDP-Initiated Sign on, a user from the Internet would not be able to access the sign on page. One way to get around this would be to expose the AD FS server to the Internet; a better solution is to utilize the AD FS 2.0 Proxy service.

In order to understand how the proxy works, it is important to understand the basic traffic flow for a token request. I will be using a simple example where there is a single application (relying party) and a single federation server (claims provider). Below you will see an explanation of the traffic flow for an internal user and for an external user in a WS-Federation Passive flow example.

image

For an internal user (see diagram above):

1. An internal user accesses claims aware application

2. The application redirects the user to the AD FS server

3. The AD FS server authenticates the user and performs an HTTP post to the application where the user gains access

Note: The redirects are performed using a standard HTTP 302 Redirect.

The posts are performed using a standard HTTP POST.

image

For an external user (see diagram above):

1. An external user accesses claims aware application

2. The application redirects the user to the AD FS 2.0 proxy server

3. The proxy server connects to the internal AD FS server and the AD FS server authenticates the user

4. The AD FS 2.0 proxy performs an HTTP Post to the application where the user gains access

Note: Depending on the infrastructure configuration, complexity, protocol, and binding the traffic flow can vary.

Basic Configuration of the AD FS 2.0 Proxy:

Configure DNS:

Configuring DNS is a very important step in this process. Applications, services, and other federation service providers do not know if there is a proxy server, so all redirects to the federation server will have the same DNS name (ex: https://sts.contoso.com/adfs/ls/) which is also the federation service name. See this article for guidance on selecting a Federation Service Name. It is up to the administrator to configure the internal DNS to point to the IP address of the internal AD FS server or internal AD FS server farm load balancer, and configure the public DNS to point to the IP address of the AD FS 2.0 Proxy Server or AD FS Proxy server farm load balancer. This way, internal users will directly contact the AD FS server, and external users will hit the AD FS 2.0 proxy, which brokers the connection to the AD FS server. If you do not have a split-brain DNS environment, it is acceptable and supported to use the HOSTS file on the proxy server to point to the internal IP address of the AD FS server.

SSL Certificates:

The internal AD FS server can have a certificate issued by your enterprise CA (or public authority), and should have a subject name that is the same as the Federation Service Name/DNS name that it is accessed with. Using Subject Alternative Names (SAN) and wildcards are supported as well. The AD FS 2.0 proxy needs to have an SSL certificate with the same subject name. Typically, you want this certificate to be from a public authority that is trusted and a part of the Microsoft Root Certificate Program. This is important because external users may not inherently trust your internal enterprise CA. This article can step you through replacing the certificates on the AD FS 2.0 server.

Firewalls:

  • Internal and external users will need to access the application over SSL (typically port 443)
  • The AD FS 2.0 Proxy Server will need to access the internal AD FS server over SSL (default port 443)
  • Internal users will need to access the internal Federation Service on its SSL port (TCP/443 by default)
  • External users will need to access the Federation Service Proxy on its SSL port (TCP/443 by default)

How does the AD FS 2.0 Proxy work?

Proxy Trust Wizard prompts admin credentials for the internal federation service (AD FS). These credentials are not stored. They are used once to issue a proxy trust token (which is simply a SAML assertion) which is used to “authenticate” the proxy to the internal federation service. The internal AD FS server knows about the proxy trust token and knows that when it receives a proxy request that request must be accompanied by the proxy trust token.

The proxy trust token has a configurable lifetime, and is self-maintained by the proxy and the federation service. The only time you need to touch it is if a server is lost or you need to revoke the proxy trust.

When a proxy trust is revoked, the proxy trust token is invalidated and the federation service will no longer accept proxy requests from proxies who are attempting to utilize that token. You must re-run the Proxy Trust Wizard on ALL proxies in order to re-establish trust.

Using PowerShell:

Using PowerShell is an easy way to view and set configuration items regarding the proxy server. I’ve listed the more common commands and parameters used to configure the AD FS 2.0 proxy.

On the proxy server:

On the federation server:

Conclusion:

I hope you now have a better idea of how the AD FS 2.0 Proxy works and the basics on how it should be configured. If you want to dig deeper, there are some excellent resources for the AD FS 2.0 Proxy.

Planning Federation Server Proxy Placement
http://technet.microsoft.com/en-us/library/dd807130%28WS.10%29.aspx

Certificate Requirements for Federation Server Proxies
http://technet.microsoft.com/en-us/library/dd807054%28WS.10%29.aspx

AD FS 2.0: How to Replace the SSL, Service Communications, Token-Signing, and Token-Decrypting Certificates
http://social.technet.microsoft.com/wiki/contents/articles/2554.aspx

Troubleshooting federation server proxy problems with AD FS 2.0
http://technet.microsoft.com/en-us/library/adfs2-troubleshooting-federation-server-proxy-problems%28WS.10%29.aspx

AD FS 2.0: Guidance for Selecting and Utilizing a Federation Service Name
http://social.technet.microsoft.com/wiki/contents/articles/4177.aspx

AD FS 2.0 Proxy Management
http://blogs.msdn.com/b/card/archive/2010/06/02/ad-fs-2-0-proxy-management.aspx

[MS-SAMLPR]: Security Assertion Markup Language (SAML) Proxy Request Signing Protocol Specification
http://msdn.microsoft.com/en-us/library/ff470131(v=PROT.13).aspx

[MS-MFPP]: Federation Service Proxy Protocol Specification
http://msdn.microsoft.com/en-us/library/dd357118(v=PROT.13).aspx

AD FS 2.0 Cmdlets in Windows PowerShell
http://technet.microsoft.com/en-us/library/ee892329.aspx

Joji “happy as a claim” Oshima

Friday Mail Sack: Best Post This Year Edition

$
0
0

Hi folks, Ned here and welcoming you to 2012 with a new Friday Mail Sack. Catching up from our holiday hiatus, today we talk about:

So put down that nicotine gum and get to reading!

Question

Is there an "official" stance on removing built-in admin shares (C$, ADMIN$, etc.) in Windows? I’m not sure this would make things more secure or not. Larry Osterman wrote a nice article on its origins but doesn’t give any advice.

Answer

The official stance is from the KB that states how to do it:

Generally, Microsoft recommends that you do not modify these special shared resources.

Even better, here are many things that will break if you do this:

Overview of problems that may occur when administrative shares are missing
http://support.microsoft.com/default.aspx?scid=kb;EN-US;842715

That’s not a complete list; it wasn’t updated for Vista/2008 and later. It’s so bad though that there’s no point, frankly. Removing these shares does not increase security, as only administrators can use those shares and you cannot prevent administrators from putting them back or creating equivalent custom shares.

This is one of those “don’t do it just because you can” customizations.

Question

The Windows PowerShell Get-ADDomainController cmdlet finds DCs, but not much actual attribute data from them. The examples on TechNet are not great. How do I get it to return useful info?

Answer

You have to use another cmdlet in tandem, without pipelining: Get-ADComputer. The Get-ADDomainController cmdlet is good mainly for searching. The Get-ADComputer cmdlet, on the other hand, does not accept pipeline input from Get-ADDomainController. Instead, you use a pseudo “nested function” to first find the PDC, then get data about that DC. For example, (this is all one command, wrapped):

get-adcomputer (get-addomaincontroller -Discover -Service "PrimaryDC").name -property * | format-list operatingsystem,operatingsystemservicepack

When you run this, PowerShell first processes the commands within the parentheses, which finds the PDC. Then it runs get-adcomputer, using the property of “Name” returned by get-addomaincontroller. Then it passes the results through the pipeline to be formatted. So it’s 1 2 3.

get-adcomputer (get-addomaincontroller -Discover -Service "PrimaryDC").name -property * | format-list operatingsystem,operatingsystemservicepack

Voila. Here I return the OS of the PDC, all without having any idea which server actually holds that role:

clip_image002[6]

Moreover, before the Internet clubs me like a baby seal: yes, a more efficient way to return data is to ensure that the –property list contains only those attributes desired:

image

Get-ADDomainController can find all sorts of interesting things via its –service argument:

PrimaryDC
GlobalCatalog
KDC
TimeService
ReliableTimeService
ADWS

The Get-ADDomain cmdlet can also find FSMO role holders and other big picture domain stuff. For example, the RID Master you need to monitor.

Question

I know about Kerberos “token bloat” with user accounts that are a member of too many groups. Does this also affect computers added to too many groups? What would some practical effects of that? We want to use a lot of them in the near future for some application … stuff.

Answer

Yes, things will break. To demonstrate, I use PowerShell to create 2000 groups in my domain and added a computer named “7-01” to them:

image

I then restart the 7-01 computer. Uh oh, the System Event log is un-pleased. At this point, 7-01 is no longer applying computer group policy, getting start scripts, or allowing any of its services to logon remotely to DCs:

image 

Oh, and check out this gem:

image

I’m sure no one will go on a wild goose chase after seeing that message. Applications will be freaking out even more, likely with the oh-so-helpful error 0x80090350:

“The system detected a possible attempt to compromise security. Please ensure that you can contact the server that authenticated you.”

Don’t do it. MaxTokenSize is probably in your future if you do, and it has limits that you cannot design your way out of. IT uniqueness is bad.

Question

We have XP systems using two partitions (C: and D:) migrating to Windows 7 with USMT. The OS are on C and the user profiles on D.  We’ll use that D partition to hold the USMT store. After migration, we’ll remove the second partition and expand the first partition to use the space freed up by the first partition.

When restoring via loadstate, will the user profiles end up on C or on D? If the profiles end up on D, we will not be able to delete the second partition obviously, and we want to stop doing that regardless.

Answer

You don’t have to do anything; it just works. Because the new profile destination is on C, USMT just slots everything in there automagically :). The profiles will be on C and nothing will be on D except the store itself and any non-profile folders*:

clip_image001
XP, before migrating

clip_image001[5]
Win7, after migrating

If users have any non-profile folders on D, that will require a custom rerouting xml to ensure they are moved to C during loadstate and not obliterated when D is deleted later. Or just add a MOVE line to whatever DISKPART script you are using to expand the partition.

Question

Should we stop the DFSR service before performing a backup or restore?

Answer

Manually stopping the DFSR service is not recommended. When backing up using the DFSR VSS Writer – which is the only supported way – replication is stopped automatically, so there’s no reason to stop the service or need to manually change replication:

Event ID=1102
Severity=Informational
The DFS Replication service has temporarily stopped replication because another
application is performing a backup or restore operation. Replication will resume
after the backup or restore operation has finished.

Event ID=1104
Severity=Informational
The DFS Replication service successfully restarted replication after a backup
or restore operation.

Another bit of implied evidence – Windows Server Backup does not stop the service.

Stopping the DFSR service for extended periods leaves you open to the risk of a USN journal wrap. And what if someone/something thinks that the service being stopped is “bad” and starts it up in the middle of the backup? Probably nothing bad happens, but certainly nothing good. Why risk it?

Question

In an environment where AGMP controls all GPOs, what is the best practice when application setup routines make edits "under the hood" to GPOs, such as the Default Domain Controllers GPO? For example, Exchange setup make changes to User Rights Assignment (SeSecurityPrivilege). Obviously if this setup process makes such edits on the live GPO in sysvol the changes will happen, but then only to have those critical edits be lost and overwritten the next time an admin re-deploys with AGPM.

Answer

[via Fabian “Wunderbar” Müller  – Ned]

From my point of view:

1. The Default Domain and Default Domain Controller Policies should be edited very rarely. Manual changes as well as automated changes (e.g. by the mentioned Exchange setup) should be well known and therefore the workaround in 2) should be feasible.

2. After those planned changes were performed, you have to use “import from production” the production GPO to the AGPM archive in order to reflect the production change to AGPM. Another way could be to periodically use “import from production” the default policies or to implement a manual / human process that defines the “import from production” procedure before a change in these policies is done using AGPM.

Not a perfect answer, but manageable.

Question

In testing the rerouting of folders, I took the this example from TechNet and placed in a separate custom.xml.  When using this custom.xml along with the other defaults (migdocs.xml and migapp.xml unchanged), the EngineeringDrafts folder is copied to %CSIDL_DESKTOP%\EngineeringDrafts' but there’s also a copy at C:\EngineeringDrafts on the destination computer.

I assume this is not expected behavior.  Is there something I’m missing?

Answer

Expected behavior, pretty well hidden though:

http://technet.microsoft.com/en-us/library/dd560751(v=WS.10).aspx

If you have an <include> rule in one component and a <locationModify> rule in another component for the same file, the file will be migrated in both places. That is, it will be included based on the <include> rule and it will be migrated based on the <locationModify> rule

That original rerouting article could state this more plainly, I think. Hardly anyone does this relativemove operation; it’s very expensive for disk space– one of those “you can, but you shouldn’t” capabilities of USMT. The first example also has an invalid character in it (the apostrophe in “user’s” on line 12, position 91 – argh!).

Don’t just comment out those areas in migdocs though; you are then turning off most of the data migration. Instead, create a copy of the migdocs.xml and modify it to include your rerouting exceptions, then use that as your custom XML and stop including the factory migdocs.xml.

There’s an example attached to this blog post down at the bottom. Note the exclude in the System context and the include/modify in the user context:

image

image

Don’t just modify the existing migdocs.xml and keep using it un-renamed either; that becomes a versioning nightmare down the road.

Question

I'm reading up on CAPolicy.inf files, and it looks like there is an error in the documentation that keeps being copied around. TechNet lists RenewalValidityPeriod=Years and RenewalValidityPeriodUnits=20 under the "Windows Server 2003" sample. This is the opposite of the Windows 2000 sample, and intuitively the "PeriodUnits" should be something like "Years" or "Weeks", while the "Period" would be an integer value. I see this on AskDS here and here also.

Answer

[via Jonathan “scissor fingers” Stephens  – Ned]

You're right that the two settings seem like they should be reversed, but unfortunately this is not correct. All of the *Period values can be set to Minutes, Hours, Days, Weeks, Months or Years, while all of the *PeriodUnits values should be set to some integer.

Originally, the two types of values were intended to be exactly what one intuitively believes they should be -- *PeriodUnits was to be Day, Weeks, Months, etc. while *Period was to be the integer value. Unfortunately, the two were mixed up early in the development cycle for Windows 2000 and, once the error was discovered, it was really too late to fix what is ultimately a cosmetic problem. We just decided to document the correct values for each setting. So in actuality, it is the Windows 2000 documentation that is incorrect as it was written using the original specs and did not take the switch into account. I’ll get that fixed.

Question

Is there a way to control the number, verbosity, or contents of the DFSR cluster debug logs (DfsrClus_nnnnn.log and DfsrClus_nnnnn.log.gz in %windir%\debug)?

Answer

Nope, sorry. It’s all static defined:

  • Severity = 5
  • Max log messages per log = 10000
  • Max number of log files = 999

Question

In your previous article you say that any registry modifications should be completed with resource restart (take resource offline and bring it back online), instead of direct service restart. However, official whitepaper (on page 16) says that CA service should be restarted by using "net stop certsvc && net start certsvc".

Also, I want to clarify about a clustered CA database backup/restore. Say, a DB was damaged or destroyed. I have a full backup of CA DB. Before restoring, I do I stop only AD CS service resource (cluadmin.msc) or stop the CA service directly (net stop certsvc)?

Answer

[via Rob “there's a Squatch in These Woods” Greene  – Ned]

The CertSvc service has no idea that it belongs to a cluster.  That’s why you setup the CA as a generic service within Cluster Administration and configure the CA registry hive within Cluster Administrator.

When you update the registry keys on the Active CA Cluster node, the Cluster service is monitoring the registry key changes.  When the resource is taken offline the Cluster Service makes a new copy of the registry keys to so that the other node gets the update.  When you stop and start the CA service the cluster services has no idea why the service is stopped and started, since it is being done outside of cluster and those registry key settings are never updated on the stand-by node. General guidance around clusters is to manage the resource state (Stop/Start) within Cluster Administrator and do not do this through Services.msc, NET STOP, SC, etc.

As far as the CA Database restore: just logon to the Active CA node and run the certutil or CA MMC to perform the operation. There’s no need to touch the service manually.

Other stuff

The Microsoft Premier Field Organization has started a new blog that you should definitely be reading.

Welcome to your nightmare (Thanks Mark!)

Totally immature and therefore funny. Doubles as a gender test.

Speaking of George Lucas re-imaginings, check out this awesome shot-by-shot comparison of Raiders and 30 other previous adventure films:


Indy whipped first!

I am completely addicted to Panzer Corps; if you ever played Panzer General in the 90’s, you will be too.

Apropos throwback video gaming and even more re-imagining, here is Battlestar Galactica as a 1990’s RPG:

   
The mail sack becomes meta of meta of meta

Like Legos? Love Simon Pegg? This is for you.

Best sci-fi books of 2011, according to IO9.

What’s your New Year’s resolution? Mine is to stop swearing so much.

 

Until next time,

- Ned “$#%^&@!%^#$%^” Pyle

Viewing all 274 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>